Welcome to Shipyard's documentation!¶
Shipyard is a directed acyclic graph controller for Kubernetes and OpenStack control plane life-cycle management, and is part of the Airship platform.
Sample Configuration File¶
The following is a sample Shipyard configuration for adaptation and use. It is auto-generated from Shipyard when this documentation is built, so if you are having issues with an option, please compare your version of Shipyard with the version of this documentation.
The sample configuration can also be viewed in file form.
[DEFAULT]
[airflow]
#
# From shipyard_api
#
# Airflow worker url scheme (string value)
#worker_endpoint_scheme = http
# Airflow worker port (integer value)
#worker_port = 8793
[armada]
#
# From shipyard_api
#
# The service type for the service playing the role of Armada. The specified
# type is used to perform the service lookup in the Keystone service catalog.
# (string value)
#service_type = armada
[base]
#
# From shipyard_api
#
# The web server for Airflow (string value)
#web_server = http://localhost:8080/
# Seconds to wait to connect to the airflow api (integer value)
#airflow_api_connect_timeout = 5
# Seconds to wait for a response from the airflow api (integer value)
#airflow_api_read_timeout = 60
# The database for shipyard (string value)
#postgresql_db = postgresql+psycopg2://shipyard:changeme@postgresql.ucp:5432/shipyard
# The database for airflow (string value)
#postgresql_airflow_db = postgresql+psycopg2://shipyard:changeme@postgresql.ucp:5432/airflow
# The SQLalchemy database connection pool size. (integer value)
#pool_size = 15
# Should DB connections be validated prior to use. (boolean value)
#pool_pre_ping = true
# How long a request for a connection should wait before one becomes available.
# (integer value)
#pool_timeout = 30
# How many connections above pool_size are allowed to be open during high
# usage. (integer value)
#pool_overflow = 10
# Time, in seconds, when a connection should be closed and re-established. -1
# for no recycling. (integer value)
#connection_recycle = -1
# The directory containing the alembic.ini file (string value)
#alembic_ini_path = /home/shipyard/shipyard
# Enable profiling of API requests. Do NOT use in production. (boolean value)
#profiler = false
[deckhand]
#
# From shipyard_api
#
# The service type for the service playing the role of Deckhand. The specified
# type is used to perform the service lookup in the Keystone service catalog.
# (string value)
#service_type = deckhand
[deployment_status_configmap]
#
# From shipyard_api
#
# Name of the Deployment Status ConfigMap (string value)
#name = deployment-status
# Namespace of the Deployment Status ConfigMap (string value)
#namespace = ucp
[document_info]
#
# From shipyard_api
#
# The name of the deployment version document that Shipyard validates (string
# value)
#deployment_version_name = deployment-version
# The schema of the deployment version document that Shipyard validates (string
# value)
#deployment_version_schema = pegleg/DeploymentData/v1
# The name of the deployment-configuration document that Shipyard expects and
# validates (string value)
#deployment_configuration_name = deployment-configuration
# The schema of the deployment-configuration document that Shipyard expects and
# validates (string value)
#deployment_configuration_schema = shipyard/DeploymentConfiguration/v1
# The schema of the deployment strategy document that Shipyard expects and
# validates. Note that the name of this document is not configurable, because
# it is controlled by a field in the deployment configuration document. (string
# value)
#deployment_strategy_schema = shipyard/DeploymentStrategy/v1
[drydock]
#
# From shipyard_api
#
# The service type for the service playing the role of Drydock. The specified
# type is used to perform the service lookup in the Keystone service catalog.
# (string value)
#service_type = physicalprovisioner
[k8s_logs]
#
# From shipyard_api
#
# Namespace of Airship Pods (string value)
#ucp_namespace = ucp
[keystone_authtoken]
#
# From keystonemiddleware.auth_token
#
# Complete "public" Identity API endpoint. This endpoint should not be an
# "admin" endpoint, as it should be accessible by all end users.
# Unauthenticated clients are redirected to this endpoint to authenticate.
# Although this endpoint should ideally be unversioned, client support in the
# wild varies. If you're using a versioned v2 endpoint here, then this should
# *not* be the same endpoint the service user utilizes for validating tokens,
# because normal end users may not be able to reach that endpoint. (string
# value)
# Deprecated group/name - [keystone_authtoken]/auth_uri
#www_authenticate_uri = <None>
# DEPRECATED: Complete "public" Identity API endpoint. This endpoint should not
# be an "admin" endpoint, as it should be accessible by all end users.
# Unauthenticated clients are redirected to this endpoint to authenticate.
# Although this endpoint should ideally be unversioned, client support in the
# wild varies. If you're using a versioned v2 endpoint here, then this should
# *not* be the same endpoint the service user utilizes for validating tokens,
# because normal end users may not be able to reach that endpoint. This option
# is deprecated in favor of www_authenticate_uri and will be removed in the S
# release. (string value)
# This option is deprecated for removal since Queens.
# Its value may be silently ignored in the future.
# Reason: The auth_uri option is deprecated in favor of www_authenticate_uri
# and will be removed in the S release.
#auth_uri = <None>
# API version of the Identity API endpoint. (string value)
#auth_version = <None>
# Interface to use for the Identity API endpoint. Valid values are "public",
# "internal" (default) or "admin". (string value)
#interface = internal
# Do not handle authorization requests within the middleware, but delegate the
# authorization decision to downstream WSGI components. (boolean value)
#delay_auth_decision = false
# Request timeout value for communicating with Identity API server. (integer
# value)
#http_connect_timeout = <None>
# How many times are we trying to reconnect when communicating with Identity
# API Server. (integer value)
#http_request_max_retries = 3
# Request environment key where the Swift cache object is stored. When
# auth_token middleware is deployed with a Swift cache, use this option to have
# the middleware share a caching backend with swift. Otherwise, use the
# ``memcached_servers`` option instead. (string value)
#cache = <None>
# Required if identity server requires client certificate (string value)
#certfile = <None>
# Required if identity server requires client certificate (string value)
#keyfile = <None>
# A PEM encoded Certificate Authority to use when verifying HTTPs connections.
# Defaults to system CAs. (string value)
#cafile = <None>
# Verify HTTPS connections. (boolean value)
#insecure = false
# The region in which the identity server can be found. (string value)
#region_name = <None>
# Optionally specify a list of memcached server(s) to use for caching. If left
# undefined, tokens will instead be cached in-process. (list value)
# Deprecated group/name - [keystone_authtoken]/memcache_servers
#memcached_servers = <None>
# In order to prevent excessive effort spent validating tokens, the middleware
# caches previously-seen tokens for a configurable duration (in seconds). Set
# to -1 to disable caching completely. (integer value)
#token_cache_time = 300
# (Optional) If defined, indicate whether token data should be authenticated or
# authenticated and encrypted. If MAC, token data is authenticated (with HMAC)
# in the cache. If ENCRYPT, token data is encrypted and authenticated in the
# cache. If the value is not one of these options or empty, auth_token will
# raise an exception on initialization. (string value)
# Possible values:
# None - <No description provided>
# MAC - <No description provided>
# ENCRYPT - <No description provided>
#memcache_security_strategy = None
# (Optional, mandatory if memcache_security_strategy is defined) This string is
# used for key derivation. (string value)
#memcache_secret_key = <None>
# (Optional) Number of seconds memcached server is considered dead before it is
# tried again. (integer value)
#memcache_pool_dead_retry = 300
# (Optional) Maximum total number of open connections to every memcached
# server. (integer value)
#memcache_pool_maxsize = 10
# (Optional) Socket timeout in seconds for communicating with a memcached
# server. (integer value)
#memcache_pool_socket_timeout = 3
# (Optional) Number of seconds a connection to memcached is held unused in the
# pool before it is closed. (integer value)
#memcache_pool_unused_timeout = 60
# (Optional) Number of seconds that an operation will wait to get a memcached
# client connection from the pool. (integer value)
#memcache_pool_conn_get_timeout = 10
# (Optional) Use the advanced (eventlet safe) memcached client pool. (boolean
# value)
#memcache_use_advanced_pool = true
# (Optional) Indicate whether to set the X-Service-Catalog header. If False,
# middleware will not ask for service catalog on token validation and will not
# set the X-Service-Catalog header. (boolean value)
#include_service_catalog = true
# Used to control the use and type of token binding. Can be set to: "disabled"
# to not check token binding. "permissive" (default) to validate binding
# information if the bind type is of a form known to the server and ignore it
# if not. "strict" like "permissive" but if the bind type is unknown the token
# will be rejected. "required" any form of token binding is needed to be
# allowed. Finally the name of a binding method that must be present in tokens.
# (string value)
#enforce_token_bind = permissive
# A choice of roles that must be present in a service token. Service tokens are
# allowed to request that an expired token can be used and so this check should
# tightly control that only actual services should be sending this token. Roles
# here are applied as an ANY check so any role in this list must be present.
# For backwards compatibility reasons this currently only affects the
# allow_expired check. (list value)
#service_token_roles = service
# For backwards compatibility reasons we must let valid service tokens pass
# that don't pass the service_token_roles check as valid. Setting this true
# will become the default in a future release and should be enabled if
# possible. (boolean value)
#service_token_roles_required = false
# The name or type of the service as it appears in the service catalog. This is
# used to validate tokens that have restricted access rules. (string value)
#service_type = <None>
# Authentication type to load (string value)
# Deprecated group/name - [keystone_authtoken]/auth_plugin
#auth_type = <None>
# Config Section from which to load plugin specific options (string value)
#auth_section = <None>
#
# From shipyard_api
#
# PEM encoded Certificate Authority to use when verifying HTTPs connections.
# (string value)
#cafile = <None>
# PEM encoded client certificate cert file (string value)
#certfile = <None>
# PEM encoded client certificate key file (string value)
#keyfile = <None>
# Verify HTTPS connections. (boolean value)
#insecure = false
# Timeout value for http requests (integer value)
#timeout = <None>
# Collect per-API call timing information. (boolean value)
#collect_timing = false
# Log requests to multiple loggers. (boolean value)
#split_loggers = false
[logging]
#
# From shipyard_api
#
# The default logging level for the root logger. ERROR=40, WARNING=30, INFO=20,
# DEBUG=10 (integer value)
#log_level = 10
# The logging levels for named loggers. Use standard representations for
# logging levels: ERROR. WARN, INFO, DEBUG. Configuration file format:
# named_log_levels = keystoneauth:INFO,othlgr:WARN (dict value)
#named_log_levels = keystoneauth:20,keystonemiddleware:20
[promenade]
#
# From shipyard_api
#
# The service type for the service playing the role of Promenade. The specified
# type is used to perform the service lookup in the Keystone service catalog.
# (string value)
#service_type = kubernetesprovisioner
[requests_config]
#
# From shipyard_api
#
# Airflow logs retrieval connect timeout (in seconds) (integer value)
#airflow_log_connect_timeout = 5
# Airflow logs retrieval timeout (in seconds) (integer value)
#airflow_log_read_timeout = 300
# Airship component validation connect timeout (in seconds) (integer value)
#validation_connect_timeout = 5
# Airship component validation timeout (in seconds) (integer value)
#validation_read_timeout = 300
# Maximum time to wait to connect to a note source URL (in seconds) (integer
# value)
#notes_connect_timeout = 5
# Read timeout for a note source URL (in seconds) (integer value)
#notes_read_timeout = 10
# Deckhand client connect timeout (in seconds) (integer value)
#deckhand_client_connect_timeout = 5
# Deckhand client timeout (in seconds) for GET, PUT, POST and DELETE request
# (integer value)
#deckhand_client_read_timeout = 300
# Connect timeout used for connecting to Drydock using the Drydock client (in
# seconds) (integer value)
#drydock_client_connect_timeout = 20
# Read timeout used for responses from Drydock using the Drydock client (in
# seconds) (integer value)
#drydock_client_read_timeout = 300
[shipyard]
#
# From shipyard_api
#
# The service type for the service playing the role of Shipyard. The specified
# type is used to perform the service lookup in the Keystone service catalog.
# (string value)
#service_type = shipyard
[validations]
#
# From shipyard_api
#
# Control the severity of the deployment-version validation during create
# configdocs. (string value)
# Possible values:
# Skip - Skip the validation altogether
# Info - Print an Info level message if the validation fails
# Warning - Print a Warning level message if the validation fails
# Error - Return an error when the validation fails and prevent the configdocs
# create from proceeding
#deployment_version_create = Skip
# Control the severity of the deployment-version validation validation during
# commit configdocs. (string value)
# Possible values:
# Skip - Skip the validation altogether
# Info - Print an Info level message if the validation fails
# Warning - Print a Warning level message if the validation fails
# Error - Return an error when the validation fails and prevent the commit from
# proceeding
#deployment_version_commit = Skip
Shipyard API¶
Logically, the API has several parts, each to handle each area of Shipyard functionality:
- Document Staging
- Action Handling
- Airflow Monitoring
- Site Statuses
- Logs Retrieval
- Notes Handling
Standards used by the API¶
See API Conventions
Query Parameters¶
Query parameters are mostly specific to a Shipyard API resource, but the following are reused to provide a more consistent interface:
- verbosity
?verbosity=1
Provides the user some control over the level of details provided in a response, with values ranging from 0 (none) to 5 (most). Only some resources are affected by setting verbosity, but all resources will accept the parameter. Setting the verbosity parameter to 0 will instruct the resource to turn off all optional data being returned. The default verbosity level is 1 (summary).
Notes on examples¶
Examples assume the following environment variables are set before issuing the curl commands shown:
$TOKEN={a valid keystone token}
$URL={the url and port of the shipyard api}
- Examples will use json formatted by the jq command for sake of presentation.
- Actual responses will not formatted.
- The use of ellipsis indicate a repeated structure in the case of lists, or prior/subsequent structure unimportant to the example (or considered understood).
- The content-length response headers have been removed so as to not cause confusion with the listed output.
Example response for an invalid token:¶
HTTP/1.1 401 Unauthorized
content-type: application/json
x-shipyard-req: a8194b97-8973-4b04-a3b3-2bd319024c5d
WWW-Authenticate: Keystone uri='http://keystone-api.ucp.svc.cluster.local:80/v3'
{
"apiVersion": "v1.0",
"status": "Failure",
"metadata": {},
"message": "Unauthenticated",
"code": "401 Unauthorized",
"details": {
"errorList": [
{
"message": "Credentials are not established"
}
],
"errorCount": 1,
"errorType": "ApiError"
},
"kind": "status",
"reason": "Credentials are not established"
}
Document Staging API¶
Shipyard will serve as the entrypoint for documents (designs, secrets, configurations, etc...) into a site. Documents are posted to Shipyard in collections, rather than individually. At any point in time, there will be several versions of documents in a site that are accessible via this API:
- The "Committed Documents" version, which represents the last version of documents that were successfully committed with a commit_configdocs action.
- The "Shipyard Buffer" version, which represents the collection of documents that have been ingested by this API since the last committed version. Note that only one set of documents maybe posted to the buffer at a time by default. (This behavior can be overridden by query parameters issued by the user of Shipyard)
- The "Last Site Action" version represents the version of documents associated with the last successful or failed site action.
- The "Successful Site Action" version represents the version of documents associated with the last successful site action.
- Site actions include
deploy_site
,update_site
, andupdate_software
.
All versions of documents rely upon Deckhand for storage. Shipyard uses the tagging features of Deckhand to find the appropriate Committed Documents, Last Site Action, Successful Site Action and Shipyard Buffer version.
/v1.0/configdocs¶
Represents the site configuration documents' current statuses
GET /v1.0/configdocs¶
Returns a list of collections including their base and new status.
Note
The output type for this request is 'Content-Type: application/json'
Query Parameters¶
- version=committed,buffer (default) Indicates which revisions tags to compare. Comparision can only be done between 2 different revision tags and the default behavior is to compare the revision with the 'committed' tag and the one with the 'buffer' tag. Valid revision tags that can be used for comparison using the API include 'buffer', 'committed', 'last_site_action' and 'successful_site_action'.
Responses¶
- 200 OK
- If documents can be retrieved
/v1.0/configdocs/{collection_id}¶
Represents the site configuration documents
Entity Structure¶
The documents as noted above (commonly yaml), in a format understood by Deckhand
POST /v1.0/configdocs/{collection_id}¶
Ingests a collection of documents. Synchronous. If a POST to the commitconfigdocs is already in progress, this POST should be rejected with a 409 error.
Note
The expected input type for this request is ‘Content-Type: application/x-yaml’
Query Parameters¶
- buffermode=append|replace|rejectOnContents
Indicates how the existing Shipyard Buffer should be handled. By default,
Shipyard will reject the POST if contents already exist in the Shipyard
Buffer.
- append: Add the collection to the Shipyard Buffer, only if that collection doesn’t already exist in the Shipyard Buffer. If the collection is already present, the request will be rejected and a 409 Conflict will be returned.
- replace: Clear the Shipyard Buffer before adding the specified collection.
- empty-collection: Set to true to indicate that this collection should be made empty and effectively deleted when the Shipyard Buffer is committed. If this parameter is specified, the POST body will be ignored.
Responses¶
- 201 Created
If the documents are successfully ingested, even with validation failures. Response message includes:
- a list of validation results
- The response headers will include a Location indicating the GET endpoint to retrieve the configDocs
- 400 Bad Request
When:
- The request is missing a message body, attempting to create a collection with no contents.
- The request has no new/changed contents for the collection.
- The request is missing a Content-Length header.
- The provided document(s) fail Shipyard/Deckhand validations.
- 409 Conflict
A condition in the system is blocking this document ingestion
- If a commitconfigdocs POST is in progress.
- If any collections exist in the Shipyard Buffer unless buffermode=replace or buffermode=append.
- If buffermode=append, and the collection being posted is already in the Shipyard Buffer
GET /v1.0/configdocs/{collection_id}¶
Returns the source documents for a collection of documents
Note
The output type for this request is ‘Content-Type: application/x-yaml’
Query Parameters¶
- version=committed | last_site_action | successful_site_action | buffer Return the documents for the version specified - buffer by default.
- cleartext-secrets=true/false If true then returns cleartext secrets in encrypted documents, otherwise those values are redacted.
Responses¶
- 200 OK
If documents can be retrieved.
- If the response is 200 with an empty response body, this indicates that the buffer version is attempting to ‘delete’ the collection when it is committed. An empty response body will only be possible for version=buffer.
- 404 Not Found
If the collection is not represented
- When version=buffer, this indicates that no representations of this collection have been POSTed since the last committed version.
- When version=committed, this indicates that either the collection has never existed or has been deleted by a prior commit.
/v1.0/renderedconfigdocs¶
Represents the site configuration documents, as a whole set - does not consider collections in any way.
GET /v1.0/renderedconfigdocs¶
Returns the full set of configdocs in their rendered form.
Note
The output type for this request is 'Content-Type: application/x-yaml'
Query Parameters¶
- version=committed | last_site_action | successful_site_action | buffer Return the documents for the version specified - buffer by default.
- cleartext-secrets=true/false If true then returns cleartext secrets in encrypted documents, otherwise those values are redacted.
Responses¶
- 200 OK
- If documents can be retrieved.
/v1.0/commitconfigdocs¶
An RPC style command to trigger a commit of the configuration documents from the Shipyard Buffer to the Committed Documents. This resource will support POST only.
Entity Structure¶
The response will be the list of validations from all downstream systems that perform validation during the commit process. The structure will match the error response object described in the API Conventions and will be an aggregation of each validating component’s responses.
POST /v1.0/commitconfigdocs¶
Synchronous. Performs the commit of the Shipyard Buffer to the Committed Documents. This invokes each of the validating components to examine the Shipyard Buffer version of the configuration documents and aggregate the responses. While performing this commit, further POSTing of configdocs, or other commits may not be invoked (Shipyard will block those requests with a 409 response). If there are any failures to validate, the Shipyard Buffer and Committed Documents will remain unchanged. If successful, the Shipyard Buffer will be cleared, and the Committed documents will be updated.
Note
If there are unhandled runtime errors during the commitconfigdocs POST, a deadlock situation may be possible. Future enhancements may improve this handling.
Query Parameters¶
- force=true | false
- By default, false, if there are validation failures the POST will fail with a 400 response. With force=true, allows for the commit to succeed (with a 200 response) even if there are validation failures from downstream components. The aggregate response of validation failures will be returned in this case, but the invalid documents will still be moved from the Shipyard Buffer to the Committed Documents.
- dryrun=true | false
- By default, false. With dryrun=true, the response will contain the validation status for the contents of the buffer. The Shipyard Buffer will not be committed.
Responses¶
- 200 OK
- If the validations are successful. Returns an “empty” structure as as response indicating no errors. A 200 may also be returned if there are validation failures, but the force=true query parameter was specified. In this case, the response will contain the list of validations.
- 400 Bad Request
- If the validations fail. Returns a populated response structure containing the aggregation of the failed validations.
- 409 Conflict
- If the there is a POST to commitconfigdocs in progress.
Example¶
{
"apiVersion": "v1",
"code": "400 Bad Request",
"details": {
"errorCount": 2,
"messageList": [
{
"error": true,
"message": "Error loading effective site: 'NoneType' object is not iterable",
"name": "Drydock"
},
{
"error": true,
"message": "Armada unable to validate configdocs",
"name": "Armada"
}
]
},
"kind": "Status",
"message": "Validations failed",
"metadata": {},
"reason": "Validation",
"status": "Failure"
}
Action API¶
The Shipyard Action API is a resource that allows for creation, control and investigation of triggered workflows. These actions encapsulate a command interface for the Airship Undercloud Platform. See Action Commands for supported actions
/v1.0/actions¶
Entity Structure¶
A list of actions that have been executed through shipyard's action API.
[
{ Action objects summarized, See below},
...
]
GET /v1.0/actions¶
Returns the list of actions in the system that have been posted, and are accessible to the current user.
Responses¶
- 200 OK
- If the actions can be retrieved.
Example¶
$ curl -X GET $URL/api/v1.0/actions -H "X-Auth-Token:$TOKEN"
HTTP/1.1 200 OK
x-shipyard-req: 0804d13e-08fc-4e60-a819-3b7532cac4ec
content-type: application/json; charset=UTF-8
[
{
"dag_status": "failed",
"parameters": {},
"steps": [
{
"id": "action_xcom",
"url": "/actions/01BTP9T2WCE1PAJR2DWYXG805V/steps/action_xcom",
"index": 1,
"state": "success"
},
{
"id": "dag_concurrency_check",
"url": "/actions/01BTP9T2WCE1PAJR2DWYXG805V/steps/dag_concurrency_check",
"index": 2,
"state": "success"
},
{
"id": "preflight",
"url": "/actions/01BTP9T2WCE1PAJR2DWYXG805V/steps/preflight",
"index": 3,
"state": "failed"
},
...
],
"action_lifecycle": "Failed",
"dag_execution_date": "2017-09-23T02:42:12",
"id": "01BTP9T2WCE1PAJR2DWYXG805V",
"dag_id": "deploy_site",
"datetime": "2017-09-23 02:42:06.860597+00:00",
"user": "shipyard",
"context_marker": "416dec4b-82f9-4339-8886-3a0c4982aec3",
"name": "deploy_site"
},
...
]
POST /v1.0/actions¶
Creates an action in the system. This will cause some action to start. The input body to this post will represent an action object that has at least these fields:
- name
- The name of the action to invoke, as noted in Action Commands
- parameters
A dictionary of parameters to use for the trigger invocation. The supported parameters will vary for the action invoked.
{ "name" : "action name", "parameters" : { varies by action } }
The POST will synchronously create the action (a shell object that represents a DAG invocation), perform any checks to validate the preconditions to run the DAG, and trigger the invocation of the DAG. The DAG will run asynchronously in airflow.
Query Parameters¶
- allow-intermediate-commits=true | false
- By default, false. User will not be able to continue with a site action, e.g. update_site if the current committed revision of documents has other prior commits that have not been used as part of a site action. With allow-intermediate-commits=true, it allows user to override the default behavior and continue with the site action. This may be the case when the user is aware of the existence of such commits and/or when such commits are intended.
Responses¶
- 201 Created
- If the action is created successfully, and all preconditions to run the DAG are successful. The response body is the action entity created.
- 400 Bad Request
- If the action name doesn't exist, or the input entity is otherwise malformed.
- 409 Conflict
- For any failed pre-run validations. The response body is the action entity created, with the failed validations. The DAG will not begin execution in this case.
Example¶
$ curl -D - -d '{"name":"deploy_site"}' -X POST $URL/api/v1.0/actions \
-H "X-Auth-Token:$TOKEN" -H "content-type:application/json"
HTTP/1.1 201 Created
location: {$URL}/api/v1.0/actions/01BTTMFVDKZFRJM80FGD7J1AKN
x-shipyard-req: 629f2ea2-c59d-46b9-8641-7367a91a7016
content-type: application/json; charset=UTF-8
{
"dag_status": "SCHEDULED",
"parameters": {},
"dag_execution_date": "2017-09-24T19:05:49",
"id": "01BTTMFVDKZFRJM80FGD7J1AKN",
"dag_id": "deploy_site",
"name": "deploy_site",
"user": "shipyard",
"context_marker": "629f2ea2-c59d-46b9-8641-7367a91a7016",
"timestamp": "2017-09-24 19:05:43.603591"
}
/v1.0/actions/{action_id}¶
Each action will be assigned an unique id that can be used to get details for the action, including the execution status.
Entity Structure¶
All actions will include fields that indicate the following data:
- action_lifecycle
A summarized value indicating the status or lifecycle phase of the action.
- Pending - The action is scheduled or preparing for execution.
- Processing - The action is underway.
- Complete - The action has completed successfully.
- Failed - The action has encountered an error, and has failed.
- Paused - The action has been paused by a user.
- command audit
- A list of commands that have been issued against the action. Initially, the action listed will be “invoke”, but may include “pause”, “unpause”, or “stop” if those commands are issued.
- context_marker
- The user supplied or system assigned context marker associated with the action
- dag_execution_date
- The execution date assigned by the workflow system during action creation.
- dag_status
- Represents the status that airflow provides for an executing DAG.
- datetime
- The time at which the action was invoked.
- id
- The identifier for the action, a 26 character ULID assigned during the creation of the action.
- name
- The name of the action, e.g.: deploy_site.
- parameters
- The parameters configuring the action that were supplied by the user during action creation.
- steps
- The list of steps for the action, including the status for that step.
- user
- The user who has invoked this action, as acquired from the authorization token.
- validations
- A list of validations that have been done, including any status information for those validations. During the lifecycle of the action, this list of validations may continue to grow.
GET /v1.0/actions/{action_id}¶
Returns the action entity for the specified id.
Responses¶
200 OK
Example¶
$ curl -D - -X GET $URL/api/v1.0/actions/01BTTMFVDKZFRJM80FGD7J1AKN \
-H "X-Auth-Token:$TOKEN"
HTTP/1.1 200 OK
x-shipyard-req: eb3eacb3-4206-40df-bd91-2a3a6d81cd02
content-type: application/json; charset=UTF-8
{
"name": "deploy_site",
"dag_execution_date": "2017-09-24T19:05:49",
"validations": [],
"id": "01BTTMFVDKZFRJM80FGD7J1AKN",
"dag_id": "deploy_site",
"command_audit": [
{
"id": "01BTTMG16R9H3Z4JVQNBMRV1MZ",
"action_id": "01BTTMFVDKZFRJM80FGD7J1AKN",
"datetime": "2017-09-24 19:05:49.530223+00:00",
"user": "shipyard",
"command": "invoke"
}
],
"user": "shipyard",
"context_marker": "629f2ea2-c59d-46b9-8641-7367a91a7016",
"datetime": "2017-09-24 19:05:43.603591+00:00",
"dag_status": "failed",
"parameters": {},
"steps": [
{
"id": "action_xcom",
"url": "/actions/01BTTMFVDKZFRJM80FGD7J1AKN/steps/action_xcom",
"index": 1,
"state": "success"
},
{
"id": "dag_concurrency_check",
"url": "/actions/01BTTMFVDKZFRJM80FGD7J1AKN/steps/dag_concurrency_check",
"index": 2,
"state": "success"
},
{
"id": "preflight",
"url": "/actions/01BTTMFVDKZFRJM80FGD7J1AKN/steps/preflight",
"index": 3,
"state": "failed"
},
{
"id": "deckhand_get_design_version",
"url": "/actions/01BTTMFVDKZFRJM80FGD7J1AKN/steps/deckhand_get_design_version",
"index": 4,
"state": null
},
...
],
"action_lifecycle": "Failed"
}
/v1.0/actions/{action_id}/validations/{validation_id}¶
Allows for drilldown to validation detailed info.
/v1.0/actions/{action_id}/steps/{step_id}¶
Allow for drilldown to step information. The step information includes details of the steps execution, successful or not, and enough to facilitate troubleshooting in as easy a fashion as possible.
Entity Structure¶
A step entity represents detailed information representing a single step of execution as part of an action. Not all fields are necessarily represented in every returned entity.
- dag_id
- The name/id of the workflow DAG that contains this step.
- duration
- The duration (seconds) for the step.
- end_date
- The timestamp of the completion of the step.
- execution_date
- The execution date of the workflow that contains this step.
- index
- The numeric value representing the position of this step in the sequence of steps associated with this step.
- operator
- The name of the processing facility used by the workflow system.
- queued_dttm
- The timestamp when the step was enqueued by the workflow system.
- start_date
- The timestamp for the beginning of execution for this step.
- state
- The execution state of the step.
- task_id
- The name of the task used by the workflow system (and also representing this step name queried in the request.
- try_number
- A number of retries taken in the case of failure. Some workflow steps may be configured to retry before considering the step truly failed.
GET /v1.0/actions/{action_id}/steps/{step_id}¶
Returns the details for a step by id for the given action by Id. #####
Responses¶
200 OK
Example¶
$ curl -D - \
-X GET $URL/api/v1.0/actions/01BTTMFVDKZFRJM80FGD7J1AKN/steps/action_xcom \
-H "X-Auth-Token:$TOKEN"
HTTP/1.1 200 OK
x-shipyard-req: 72daca4d-1f79-4e08-825f-2ad181912a47
content-type: application/json; charset=UTF-8
{
"end_date": "2017-09-24 19:05:59.446213",
"duration": 0.165181,
"queued_dttm": "2017-09-24 19:05:52.993983",
"operator": "PythonOperator",
"try_number": 1,
"task_id": "action_xcom",
"state": "success",
"execution_date": "2017-09-24 19:05:49",
"dag_id": "deploy_site",
"index": 1,
"start_date": "2017-09-24 19:05:59.281032"
}
/v1.0/actions/{action_id}/control/{control_verb}¶
Allows for issuing DAG controls against an action.
Entity Structure¶
None, there is no associated response entity for this resource
POST /v1.0/actions/{action_id}/control/{control_verb}¶
Trigger a control action against an activity.- this includes: pause, unpause
Responses¶
202 Accepted
Example¶
Failure case - command is invalid for the execution state of the action.
$ curl -D - \
-X POST $URL/api/v1.0/actions/01BTTMFVDKZFRJM80FGD7J1AKN/control/pause \
-H "X-Auth-Token:$TOKEN"
HTTP/1.1 409 Conflict
content-type: application/json
x-shipyard-req: 9c9551e0-335c-4297-af93-8440cc6b324f
{
"apiVersion": "v1.0",
"status": "Failure",
"metadata": {},
"message": "Unable to pause action",
"code": "409 Conflict",
"details": {
"errorList": [
{
"message": "dag_run state must be running, but is failed"
}
],
"errorCount": 1,
"errorType": "ApiError"
},
"kind": "status",
"reason": "dag_run state must be running, but is failed"
}
Success case
$ curl -D - \
-X POST $URL/api/v1.0/actions/01BTTMFVDKZFRJM80FGD7J1AKN/control/pause \
-H "X-Auth-Token:$TOKEN"
HTTP/1.1 202 Accepted
content-length: 0
x-shipyard-req: 019fae1c-03b0-4af1-b57d-451ae6ddac77
content-type: application/json; charset=UTF-8
Airflow Monitoring API¶
Airflow has a primary function of scheduling DAGs, as opposed to Shipyard’s primary case of triggering DAGs. Shipyard provides functionality to allow for an operator to monitor and review these scheduled workflows (DAGs) in addition to the ones triggered by Shipyard. This API will allow for accessing Airflow DAGs of any type – providing a peek into the totality of what is happening in Airflow.
/v1.0/workflows¶
The resource that represents DAGs (workflows) in airflow
Entity Structure¶
A list of objects representing the DAGs that have run in airflow.
GET /v1.0/workflows¶
Queries airflow for DAGs that are running or have run (successfully or unsuccessfully) and provides a summary of those things.
Query parameters¶
- since={iso8601 date (past) or duration}
- optional, a boundary in the past within which to retrieve results. Default is 30 days in the past.
Responses¶
200 OK
Example¶
Notice the workflow_id values, these can be used for drilldown.
curl -D - -X GET $URL/api/v1.0/workflows -H "X-Auth-Token:$TOKEN"
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
x-shipyard-req: 3ab4ccc6-b956-4c7a-9ae6-183c562d8297
[
{
"execution_date": "2017-10-09 21:18:56",
"end_date": null,
"workflow_id": "deploy_site__2017-10-09T21:18:56.000000",
"start_date": "2017-10-09 21:18:56.685999",
"external_trigger": true,
"dag_id": "deploy_site",
"state": "failed",
"run_id": "manual__2017-10-09T21:18:56"
},
{
"execution_date": "2017-10-09 21:19:03",
"end_date": null,
"workflow_id": "deploy_site__2017-10-09T21:19:03.000000",
"start_date": "2017-10-09 21:19:03.361522",
"external_trigger": true,
"dag_id": "deploy_site",
"state": "failed",
"run_id": "manual__2017-10-09T21:19:03"
}
...
]
/v1.0/workflows/{workflow_id}¶
Entity Structure¶
An object representing the information available from airflow regarding a DAG’s execution
GET /v1.0/workflows/{id}¶
Further details of a particular workflow’s steps. All steps of all sub-dags will be included in the list of steps, as well as section indicating the sub-dags for this parent workflow.
Responses¶
200 OK
Example¶
Note
Sub_dags can be queried to restrict to only that sub-dag’s steps. e.g. using this as {workflow_id}: deploy_site.preflight.armada_preflight_check__2017-10-09T21:19:03.000000
curl -D - \
-X GET $URL/api/v1.0/workflows/deploy_site__2017-10-09T21:19:03.000000 \
-H "X-Auth-Token:$TOKEN"
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
x-shipyard-req: 98d71530-816a-4692-9df2-68f22c057467
{
"execution_date": "2017-10-09 21:19:03",
"end_date": null,
"workflow_id": "deploy_site__2017-10-09T21:19:03.000000",
"start_date": "2017-10-09 21:19:03.361522",
"external_trigger": true,
"steps": [
{
"end_date": "2017-10-09 21:19:14.916220",
"task_id": "action_xcom",
"start_date": "2017-10-09 21:19:14.798053",
"duration": 0.118167,
"queued_dttm": "2017-10-09 21:19:08.432582",
"try_number": 1,
"state": "success",
"operator": "PythonOperator",
"dag_id": "deploy_site",
"execution_date": "2017-10-09 21:19:03"
},
{
"end_date": "2017-10-09 21:19:25.283785",
"task_id": "dag_concurrency_check",
"start_date": "2017-10-09 21:19:25.181492",
"duration": 0.102293,
"queued_dttm": "2017-10-09 21:19:19.283132",
"try_number": 1,
"state": "success",
"operator": "ConcurrencyCheckOperator",
"dag_id": "deploy_site",
"execution_date": "2017-10-09 21:19:03"
},
{
"end_date": "2017-10-09 21:20:05.394677",
"task_id": "preflight",
"start_date": "2017-10-09 21:19:34.994775",
"duration": 30.399902,
"queued_dttm": "2017-10-09 21:19:28.449848",
"try_number": 1,
"state": "failed",
"operator": "SubDagOperator",
"dag_id": "deploy_site",
"execution_date": "2017-10-09 21:19:03"
},
...
],
"dag_id": "deploy_site",
"state": "failed",
"run_id": "manual__2017-10-09T21:19:03",
"sub_dags": [
{
"execution_date": "2017-10-09 21:19:03",
"end_date": null,
"workflow_id": "deploy_site.preflight__2017-10-09T21:19:03.000000",
"start_date": "2017-10-09 21:19:35.082479",
"external_trigger": false,
"dag_id": "deploy_site.preflight",
"state": "failed",
"run_id": "backfill_2017-10-09T21:19:03"
},
...,
{
"execution_date": "2017-10-09 21:19:03",
"end_date": null,
"workflow_id": "deploy_site.preflight.armada_preflight_check__2017-10-09T21:19:03.000000",
"start_date": "2017-10-09 21:19:48.265023",
"external_trigger": false,
"dag_id": "deploy_site.preflight.armada_preflight_check",
"state": "failed",
"run_id": "backfill_2017-10-09T21:19:03"
}
]
}
Site Statuses API¶
Site Statuses API retrieves node provision status and/or node power state for all nodes in the site.
/v1.0/site_statuses¶
GET /v1.0/site_statuses¶
Returns the dictionary with nodes provision status and nodes power state status
Query Parameters¶
filters=nodes-provision-status,machines-power-state filters query parameter allows to specify one or more status types to return statuses of those types. The filter value
nodes-provision-status
will fetch provisioning statuses of all nodes in the site. The filter valuemachines-power-state
will fetch power states of all baremetal machines in the site. By omitting the filters query parameter, statuses of all status types will be returned. To specify multiple items explicitly, separate items with the URL encoded version of a comma: %2C. e.g.:&filters=nodes-provision-status%2Cmachines-power-state
Responses¶
- 200 OK
- If statuses are retrieved successfully.
- 400 Bad Request
- If invalid filters option is given.
Example¶
$ curl -X GET $URL/api/v1.0/site_statuses -H "X-Auth-Token:$TOKEN"
HTTP/1.1 200 OK
x-shipyard-req: 0804d13e-08fc-4e60-a819-3b7532cac4ec
content-type: application/json; charset=UTF-8
{
{
"nodes-provision-status": [
{
"hostname": "abc.xyz.com",
"status": "Ready"
},
{
"hostname": "def.xyz.com",
"status": "Ready"
}
],
"machines-power-state": [
{
"hostname": "abc.xyz.com",
"power_state": "On",
},
{
"hostname": "def.xyz.com",
"power_state": "On",
}
]
}
}
$ curl -X GET $URL/api/v1.0/site_statuses?filters=nodes-provision-status \
-H "X-Auth-Token:$TOKEN"
HTTP/1.1 200 OK
x-shipyard-req: 0804d13e-08fc-4e60-a819-3b7532cac4ec
content-type: application/json; charset=UTF-8
{
{
"nodes-provision-status": [
{
"hostname": "abc.xyz.com",
"status": "Ready"
},
{
"hostname": "def.xyz.com",
"status": "Ready"
}
]
}
}
$ curl -X GET $URL/api/v1.0/site_statuses?filters=machines-power-state \
-H "X-Auth-Token:$TOKEN"
HTTP/1.1 200 OK
x-shipyard-req: 0804d13e-08fc-4e60-a819-3b7532cac4ec
content-type: application/json; charset=UTF-8
{
{
"machines-power-state": [
{
"hostname": "abc.xyz.com",
"power_state": "On",
},
{
"hostname": "def.xyz.com",
"power_state": "On",
}
]
}
}
::
$ curl -X GET $URL/api/v1.0/site_statuses?filters=nodes-provision-status%2Cmachines-power-state \
-H "X-Auth-Token:$TOKEN"
HTTP/1.1 200 OK
x-shipyard-req: 0804d13e-08fc-4e60-a819-3b7532cac4ec
content-type: application/json; charset=UTF-8
{
{
"nodes-provision-status": [
{
"hostname": "abc.xyz.com",
"status": "Ready"
},
{
"hostname": "def.xyz.com",
"status": "Ready"
}
],
"machines-power-state": [
{
"hostname": "abc.xyz.com",
"power_state": "On",
},
{
"hostname": "def.xyz.com",
"power_state": "On",
}
]
}
}
Logs Retrieval API¶
This API allows users to query and view logs. Its usuage is currently limited
to Airflow logs retrieval but it can be extended in the future to retrieve other
logs. For instance, a possible use case might be to retrieve or tail
the
Kubernetes logs.
/v1.0/actions/{action_id}/steps/{step_id}/logs¶
This API allows users to query and view the logs for a particular workflow step in Airflow. By default, it will retrieve the logs from the last attempt. Note that a workflow step can retry multiple times with the names of the logs as 1.log, 2.log, 3.log, etc. A user can specify the try number to view the logs for a particular failed attempt, which will be useful during a troubleshooting session.
Entity Structure¶
Raw text of the logs retrieved from Airflow for that particular workflow step.
GET /v1.0/actions/{action_id}/steps/{step_id}/logs¶
Queries Airflow and retrieves logs for a particular workflow step.
Query parameters¶
- try={int try_number}
- optional, represents a particular attempt of the workflow step. Default value is set to None.
Responses¶
200 OK 4xx or 5xx
A 4xx or 5xx code will be returned if some error happens during Airflow HTTP request or Airflow responds with a status code of 400 or greater.
Example¶
curl -D - \
-X GET $URL/api/v1.0/actions/01CASSSZT7CP1F0NKHCAJBCJGR/steps/action_xcom/logs?try=2 \
-H "X-Auth-Token:$TOKEN"
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
x-shipyard-req: 49f74418-22b3-4629-8ddb-259bdfccf2fd
[2018-04-11 07:30:41,945] {{cli.py:374}} INFO - Running on host airflow-worker-0.airflow-worker-discovery.ucp.svc.cluster.local
[2018-04-11 07:30:41,991] {{models.py:1197}} INFO - Dependencies all met for <TaskInstance: deploy_site.action_xcom 2018-04-11 07:30:37 [queued]>
[2018-04-11 07:30:42,001] {{models.py:1197}} INFO - Dependencies all met for <TaskInstance: deploy_site.action_xcom 2018-04-11 07:30:37 [queued]>
[2018-04-11 07:30:42,001] {{models.py:1407}} INFO -
--------------------------------------------------------------------------------
Starting attempt 2 of 2
--------------------------------------------------------------------------------
[2018-04-11 07:30:42,022] {{models.py:1428}} INFO - Executing <Task(PythonOperator): action_xcom> on 2018-04-11 07:30:37
[2018-04-11 07:30:42,023] {{base_task_runner.py:115}} INFO - Running: ['bash', '-c', 'airflow run deploy_site action_xcom 2018-04-11T07:30:37 --job_id 2 --raw -sd DAGS_FOLDER/deploy_site.py']
[2018-04-11 07:30:42,606] {{base_task_runner.py:98}} INFO - Subtask: [2018-04-11 07:30:42,606] {{driver.py:120}} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/Grammar.txt
[2018-04-11 07:30:42,635] {{base_task_runner.py:98}} INFO - Subtask: [2018-04-11 07:30:42,634] {{driver.py:120}} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/PatternGrammar.txt
[2018-04-11 07:30:43,515] {{base_task_runner.py:98}} INFO - Subtask: [2018-04-11 07:30:43,515] {{configuration.py:206}} WARNING - section/key [celery/celery_ssl_active] not found in config
[2018-04-11 07:30:43,516] {{base_task_runner.py:98}} INFO - Subtask: [2018-04-11 07:30:43,515] {{default_celery.py:41}} WARNING - Celery Executor will run without SSL
[2018-04-11 07:30:43,517] {{base_task_runner.py:98}} INFO - Subtask: [2018-04-11 07:30:43,516] {{__init__.py:45}} INFO - Using executor CeleryExecutor
[2018-04-11 07:30:43,822] {{base_task_runner.py:98}} INFO - Subtask: [2018-04-11 07:30:43,821] {{models.py:189}} INFO - Filling up the DagBag from /usr/local/airflow/dags/deploy_site.py
[2018-04-11 07:30:43,892] {{cli.py:374}} INFO - Running on host airflow-worker-0.airflow-worker-discovery.ucp.svc.cluster.local
[2018-04-11 07:30:43,945] {{base_task_runner.py:98}} INFO - Subtask: [2018-04-11 07:30:43,944] {{python_operator.py:90}} INFO - Done. Returned value was: None
[2018-04-11 07:30:43,992] {{base_task_runner.py:98}} INFO - Subtask: """)
Notes Handling API¶
The notes facilities of Shipyard are primarily interwoven in other APIs. This endpoint adds the ability to retrieve additional information associated with a note. The first use case for this API is the retrieval of builddata from Drydock, which can be many hundreds of kilobytes of text.
/v1.0/notedetails/{note_id}¶
Retrieves the note details that are associated via URL with a note at the time of note creation. Unlike some responses from Shipyard, this API returns the remote information as-is, as the response body, without any further wrapping in a JSON structure.
Entity Structure¶
Raw text of the note's associated information.
GET /v1.0/notedetails/{node_id}¶
Looks up the specified note and follows the associated URL to retrieve information related to the note.
Query parameters¶
N/A
Responses¶
200 OK
Accompanied by the text looked up from the note's associated URL
400 Bad Request
When the note_id is not a valid ULID value.
404 Not Found
When the note does not exist, or the note does not have a URL associated.
500 Internal Server Error
When the remote source of the information cannot be accessed, or if there is a misconfiguration of the type of note preventing appropriate authorization checking.
Example¶
curl -D - \
-X GET $URL/api/v1.0/notedetails/01CASSSZT7CP1F0NKHCAJBCJGR \
-H "X-Auth-Token:$TOKEN"
HTTP/1.1 200 OK
x-shipyard-req: 49f74418-22b3-4629-8ddb-259bdfccf2fd
Potentially a lot of information here
Action Commands¶
Example invocation¶
API input to create an action follows this pattern, varying the name field:
Without Parmeters:
POST /v1.0/actions
{"name" : "update_site"}
With Parameters:
POST /v1.0/actions
{
"name": "redeploy_server",
"parameters": {
"target_nodes": ["node1", "node2"]
}
}
POST /v1.0/actions
{
"name": "update_site",
"parameters": {
"continue-on-fail": "true"
}
}
Analogous CLI commands:
shipyard create action update_site
shipyard create action redeploy_server --param="target_nodes=node1,node2"
shipyard create action update_site --param="continue-on-fail=true"
Supported actions¶
These actions are currently supported using the Action API and CLI
deploy_site¶
Triggers the initial deployment of a site, using the latest committed configuration documents. Steps, conceptually:
- Concurrency check
- Prevents concurrent site modifications by conflicting actions/workflows.
- Preflight checks
- Ensures all Airship components are in a responsive state.
- Validate design
- Asks each involved Airship component to validate the design. This ensures that the previously committed design is valid at the present time.
- Drydock build
- Orchestrates the Drydock component to configure hardware and the Kubernetes environment (Drydock -> Promenade)
- Armada build
- Orchestrates Armada to configure software on the nodes as designed.
update_site¶
Applies a new committed configuration to the environment. The steps of update_site mirror those of deploy_site.
update_software¶
Triggers an update of the software in a site, using the latest committed configuration documents. Steps, conceptually:
- Concurrency check
- Prevents concurrent site modifications by conflicting actions/workflows.
- Validate design
- Asks each involved Airship component to validate the design. This ensures that the previously committed design is valid at the present time.
- Armada build
- Orchestrates Armada to configure software on the nodes as designed.
redeploy_server¶
Using parameters to indicate which server(s) triggers a teardown and subsequent deployment of those servers to restore them to the current committed design.
This action is a target action, and does not apply the site action labels to the revision of documents in Deckhand. Application of site action labels is reserved for site actions such as deploy_site and update_site.
Like other target actions that will use a baremetal or Kubernetes node as a target, the target_nodes parameter will be used to list the names of the nodes that will be acted upon.
Using redeploy_server¶
Danger
At this time, there are no safeguards with regard to the running workload in place before tearing down a server and the result may be very disruptive to a working site. Users are cautioned to ensure the server being torn down is not running a critical workload. To support controlling this, the Shipyard service allows actions to be associated with RBAC rules. A deployment of Shipyard can restrict access to this action to help prevent unexpected disaster.
Redeploying a server can have consequences to the running workload as noted above. There are actions that can be taken by a deployment engineer or system administrator before performing a redeploy_server to mitigate the risks and impact.
There are three broad categories of nodes that can be considered in regard to redeploy_server. It is possible that a node is both a Worker and a Control node depending on the deployment of Airship:
Broken Node:
A non-functional node, e.g. a host that has been corrupted to the point of being unable to participate in the Kubernetes cluster.
Worker Node:
A node that is participating in the Kubernetes cluster not running control plane software, but providing capacity for workloads running in the environment.
Control Node:
A node that is participating in the Kubernetes cluster and is hosting control plane software. E.g. Airship or other components that serve as controllers for the rest of the cluster in some way. These nodes may run software such as etcd or databases that contribute to the health of the overall Kubernetes cluster.
Note that there is also the Genesis host, used to bootstrap the Airship platform. This node currently runs the Airship containers, including some that are not yet able to be migrated to other nodes, e.g. the MAAS rack controller, and disruptions arising from moving PostgreSQL.
Important
Use of redeploy_server on the Airship Genesis host/node is not supported, and will result in serious disruption.
- Yes
- Recommended step for this node type
- No
- Generally not necessary for this node type
- N/A
- Not applicable for this node type
Action | Broken | Worker | Control |
---|---|---|---|
Coordinate workload impacts with users [*] | Yes | Yes | No |
Clear Kubernetes labels from the node (for each label) | N/A | Yes | Yes |
$ kubectl label nodes <node> <label>- |
|||
Etcd - check for cluster health | N/A | N/A | Yes |
$ kubectl -n kube-system exec kubernetes-etcd-<hostname> etcdctl
member list |
|||
Drain Kubernetes node | N/A | Yes | Yes |
$ kubectl drain <node> |
|||
Disable the kubelet service | N/A | Yes | Yes |
|
|||
Remove node from Kubernetes | Yes | Yes | Yes |
$ kubectl delete node <node> |
|||
Backup Disks (processes vary) [†] | Yes | Yes | Yes |
[*] | Of course it is up to the infrastructure operator if they wish to coordinate with their users. This guide assumes client or user communication as a common courtesy. |
[†] | Server redeployment will (quick) erase all disks during the process, but desired enhancements to redeploy_server may include options for disk handling. Situationally, it may not be necessary to backup disks if the underlying implementation already provides the needed resiliency and redundancy. |
relabel_nodes¶
Using parameters to indicate which server(s), triggers an update to the Kubernetes node labels for those servers.
test_site¶
Triggers the execution of the Helm tests corresponding to all deployed releases in all namespaces. Steps, conceptually:
- Preflight checks
- Ensures all Airship components are in a responsive state.
- Armada test
- Invokes Armada to execute Helm tests for all releases.
Using test_site¶
The test_site
action accepts one optional parameter:
- release: The name of a release to test. When provided, tests are only executed for the specified release.
An example of invoking Helm tests for a single release:
shipyard create action test_site --param="namespace=openstack" --param="release=keystone"
update labels¶
Triggers an update to the Kubernetes node labels for specified server(s)
Shipyard CLI¶
Environment Variables¶
All commands will utilize the following environment variables to determine necessary information for execution, unless otherwise noted.
OpenStack Keystone Authorization environment variables¶
The Shipyard CLI/API Client will check for the presence of appropriate environment setup to do authentication on behalf of the user. The openrc variables that will be used are as follows:
- OS_PROJECT_DOMAIN_NAME ("default" if not specified)
- OS_USER_DOMAIN_NAME ("default" if not specified)
- OS_DOMAIN_NAME
- OS_AUTH_TOKEN
- OS_PROJECT_NAME
- OS_USERNAME
- OS_PASSWORD
- OS_AUTH_URL The fully qualified identity endpoint. E.g. http://keystone.ucp.fully.qualified.name:80/v3
OpenStack Keystone Authorization environment variables not used¶
These OpenStack identity variables are not supported by shipyard.
- OS_IDENTITY_API_VERSION
- This value will be ignored as Shipyard only supports version 3 at this time
Shipyard command options¶
The base shipyard command supports options that determine cross-CLI behaviors. These options are positioned immediately following the shipyard command as shown here:
shipyard <--these options> subcommands...
shipyard
[--context-marker=<uuid>]
[--debug/--no-debug]
[--os-{various}=<value>]
[--output-format=[format | raw | cli]] (default = cli)
[--verbosity=[0-5] (default = 1)
<subcommands, as noted in this document>
- --context-marker=<uuid>
- Specifies a UUID (8-4-4-4-12 format) that will be used to correlate logs, transactions, etc... in downstream activities triggered by this interaction. If not specified, Shipyard will supply a new UUID to serve as this marker. (optional)
- --debug | --no-debug
- Enable/disable debugging of this CLI and API client. Defaults to no debug
- --os-<various>=<value>
- See supported OpenStack Keystone Authorization Environment variables above for the list of supported names, converting to a downcase version of the environment variable. E.g.: --os-auth-url=http://keystone.ucp:80/v3 If not specified, the environment variables matching these options will be used instead. The Keystone os-auth-url should reference the exposed keystone:port for the target Shipyard environment, as this Keystone will be used to discover the instance of Shipyard. For most invocations other than help, a valid combination of values must be resolved to authenticate and authorize the user's invocation.
- --output-format=<format | raw | cli>
Specifies the desired output formatting such that:
- format
- Display the raw output from the invoked Shipyard API in a column restricted mode.
- raw
- Display the result from the invoked Shipyard API as-is, without modification.
- cli (default)
- Display results in a plain text interpretation of the response from the invoked Shipyard API.
- --verbosity=<0-5>
- Integer value specifying the level of verbosity for the response information
gathered from the API server. Setting a verbosity of
0
will remove all additional information from the response, a verbosity setting of1
will include summary level notes and information, and5
will include all available information. This setting does not necessarily effect all of the CLI commands, but may be set on all invocations. A default value of1
is used if not specified.
Commit Commands¶
commit configdocs¶
Attempts to commit the Shipyard Buffer documents, first invoking validation by downstream components.
shipyard commit configdocs
[--force]
[--dryrun]
Example:
shipyard commit configdocs
- --force
- Force the commit to occur, even if validations fail.
- --dryrun
- Retrieve validation status for the contents of the buffer without committing.
Sample¶
$ shipyard commit configdocs
Configuration documents committed.
Status: Validations succeeded
Reason: Validation
- Info: DD1001
Message: Rational Boot Storage: Validation successful.
Source: Drydock
- Info: DD2002
Message: IP Locality Check: Validation successful.
Source: Drydock
- Info: DD2003
Message: MTU Rationality: Validation successful.
Source: Drydock
- Info: DD2004
Message: Network Trunking Rationalty: Validation successful.
Source: Drydock
- Info: DD2005
Message: Duplicated IP Check: Validation successful.
Source: Drydock
- Info: DD3001
Message: Platform Selection: Validation successful.
Source: Drydock
- Info: DD1006
Message: Network Bond Rationality: Validation successful.
Source: Drydock
- Info: DD2002
Message: Storage Partitioning: Validation successful.
Source: Drydock
- Info: DD2003
Message: Storage Sizing: Validation successful.
Source: Drydock
- Info: DD1007
Message: Allowed Network Check: Validation successful.
Source: Drydock
#### Errors: 0, Warnings: 0, Infos: 10, Other: 0 ####
Control commands¶
pause, unpause, stop¶
Three separate commands with a common format that allow for controlling the processing of actions created in Shipyard.
- pause
- pause something in progress e.g. an executing action
- unpause
- unpause something paused e.g. a paused action
- stop
- stops an executing or paused item e.g. an action
shipyard pause
<type>
<id>
shipyard unpause
<type>
<id>
shipyard stop
<type>
<id>
shipyard
pause|unpause|stop
<qualified name>
Example:
shipyard pause action 01BTG32JW87G0YKA1K29TKNAFX
shipyard unpause action 01BTG32JW87G0YKA1K29TKNAFX
shipyard stop action 01BTG32JW87G0YKA1K29TKNAFX
shipyard pause action/01BTG32JW87G0YKA1K29TKNAFX
- <type>
- The type of entity to take action upon. Currently supports: action
- <id>
- The id of the entity to take action upon.
- <qualified name>
- The qualified name of the item to take the specified action upon
Sample¶
$ shipyard pause action/01BZZMEXAVYGG7BT0BMA3RHYY7
pause successfully submitted for action 01BZZMEXAVYGG7BT0BMA3RHYY7
A failed command:
$ shipyard pause action/01BZZK07NF04XPC5F4SCTHNPKN
Error: Unable to pause action
Reason: dag_run state must be running, but is failed
- Error: dag_run state must be running, but is failed
Create Commands¶
create action¶
Invokes the specified workflow through Shipyard. Returns the id of the action invoked so that it can be queried subsequently.
shipyard create action
<action_command>
--param=<parameter> (repeatable)
[--allow-intermediate-commits]
Example:
shipyard create action redeploy_server --param="target_nodes=mcp"
shipyard create action update_site --param="continue-on-fail=true"
- <action_command>
- The action to invoke.
- --param=<parameter>
- A parameter to be provided to the action being invoked. (repeatable) Note that we can pass in different information to the create action workflow, i.e. name of server to be redeployed, whether to continue the workflow if there are failures in Drydock, e.g. failed health checks.
- --allow-intermediate-commits
- Allows continuation of a site action, e.g. update_site even when the current committed revision of documents has other prior commits that have not been used as part of a site action.
Sample¶
$ shipyard create action deploy_site
Name Action Lifecycle
deploy_site action/01BZZK07NF04XPC5F4SCTHNPKN None
create configdocs¶
Load documents into the Shipyard Buffer. The use of one or more filenames or one or more directory options must be specified.
shipyard create configdocs
<collection>
[--append | --replace] [--empty-collection]
--filename=<filename> (repeatable)
|
--directory=<directory> (repeatable)
Example:
shipyard create configdocs design --append --filename=site_design.yaml
Note
If neither append nor replace are specified, the Shipyard API default value of rejectoncontents will be used.
Note
--filename and/or --directory must be specified unless --empty-collection is used.
- <collection>
- The collection to load.
- --append
- Add the collection to the Shipyard Buffer. This will fail if the collection already exists.
- --replace
- Clear the shipyard buffer and replace it with the specified contents.
- --empty-collection
- Indicate to Shipyard that the named collection should be made empty (contain no documents). If --empty-collection is specified, the files named by --filename or --directory will be ignored.
- --filename=<filename>
- The file name to use as the contents of the collection. (repeatable) If
any documents specified fail basic validation, all of the documents will
be rejected. Use of
filename
parameters may not be used in conjunction with the directory parameter. - --directory=<directory>
- A directory containing documents that will be joined and loaded as a
collection. (Repeatable) Any documents that fail basic validation will
reject the whole set. Use of the
directory
parameter may not be used with thefilename
parameter. - --recurse
- Recursively search through all directories for sub-directories that contain yaml files.
Sample¶
$ shipyard create configdocs coll1 --filename=/home/ubuntu/yaml/coll1.yaml
Configuration documents added.
Status: Validations succeeded
Reason: Validation
Attempting to load the same collection into the uncommitted buffer.
$ shipyard create configdocs coll1 --filename=/home/ubuntu/yaml/coll1.yaml
Error: Invalid collection specified for buffer
Reason: Buffermode : rejectoncontents
- Error: Buffer is either not empty or the collection already exists in buffer. Setting a different buffermode may provide the desired functionality
Replace the buffer with --replace
$ shipyard create configdocs coll1 --replace --filename=/home/ubuntu/yaml/coll1.yaml
Configuration documents added.
Status: Validations succeeded
Reason: Validation
Describe Commands¶
describe¶
Retrieves the detailed information about the supplied namespaced item
shipyard describe
<namespaced_item>
Example:
shipyard describe action/01BTG32JW87G0YKA1K29TKNAFX
Equivalent to:
shipyard describe action 01BTG32JW87G0YKA1K29TKNAFX
shipyard describe notedetails/01BTG32JW87G0YKA1KA9TBNA32
Equivalent to:
shipyard describe notedetails 01BTG32JW87G0YKA1KA9TBNA32
shipyard describe step/01BTG32JW87G0YKA1K29TKNAFX/preflight
Equivalent to:
shipyard describe step preflight --action=01BTG32JW87G0YKA1K29TKNAFX
shipyard describe validation/01BTG32JW87G0YKA1K29TKNAFX/01BTG3PKBS15KCKFZ56XXXBGF2
Equivalent to:
shipyard describe validation 01BTG3PKBS15KCKFZ56XXXBGF2 \
--action=01BTG32JW87G0YKA1K29TKNAFX
shipyard describe workflow/deploy_site__2017-01-01T12:34:56.123456
Equivalent to:
shipyard describe workflow deploy_site__2017-01-01T12:34:56.123456
describe action¶
Retrieves the detailed information about the supplied action id.
shipyard describe action
<action_id>
Example:
shipyard describe action 01BTG32JW87G0YKA1K29TKNAFX
Sample¶
$ shipyard describe action/01BZZK07NF04XPC5F4SCTHNPKN
Name: deploy_site
Action: action/01BZZK07NF04XPC5F4SCTHNPKN
Lifecycle: Failed
Parameters: {}
Datetime: 2017-11-27 20:34:24.610604+00:00
Dag Status: failed
Context Marker: 71d4112e-8b6d-44e8-9617-d9587231ffba
User: shipyard
Steps Index State Footnotes
step/01BZZK07NF04XPC5F4SCTHNPKN/action_xcom 1 success
step/01BZZK07NF04XPC5F4SCTHNPKN/dag_concurrency_check 2 success
step/01BZZK07NF04XPC5F4SCTHNPKN/deckhand_get_design_version 3 failed (1)
step/01BZZK07NF04XPC5F4SCTHNPKN/validate_site_design 4 None
step/01BZZK07NF04XPC5F4SCTHNPKN/deckhand_get_design_version 5 failed
step/01BZZK07NF04XPC5F4SCTHNPKN/deckhand_get_design_version 6 failed
step/01BZZK07NF04XPC5F4SCTHNPKN/drydock_build 7 None
Step Footnotes Note
(1) > step metadata: deckhand_get_design_version(2017-11-27 20:34:34.443053): Unable to determine version
- Info available with 'describe notedetails/09876543210987654321098765'
Commands User Datetime
invoke shipyard 2017-11-27 20:34:34.443053+00:00
Validations: None
Action Notes:
> action metadata: 01BZZK07NF04XPC5F4SCTHNPKN(2017-11-27 20:34:24.610604): Invoked using revision 3
describe notedetails¶
Retrieves extended information related to a note.
shipyard describe notedetails <note_id>
Example:
shipyard describe notedetails/01BTG32JW87G0YKA1KA9TBNA32
- <note_id>
- The id of the note referenced as having more details in a separate response
Sample¶
$ shipyard describe notedetails/01BTG32JW87G0YKA1KA9TBNA32
<a potentially large amount of data as returned by the source of info>
describe step¶
Retrieves the step details associated with an action and step.
shipyard describe step
<step_id>
--action=<action id>
Example:
shipyard describe step preflight --action=01BTG32JW87G0YKA1K29TKNAFX
- <step id>
- The id of the step found in the describe action response.
- --action=<action id>
- The action id that provides the context for this step.
Sample¶
$ shipyard describe step/01BZZK07NF04XPC5F4SCTHNPKN/action_xcom
Name: action_xcom
Task ID: step/01BZZK07NF04XPC5F4SCTHNPKN/action_xcom
Index: 1
State: success
Start Date: 2017-11-27 20:34:45.604109
End Date: 2017-11-27 20:34:45.818946
Duration: 0.214837
Try Number: 1
Operator: PythonOperator
Step Notes:
> step metadata: deckhand_get_design_version(2017-11-27 20:34:34.443053): Unable to determine version
- Info available with 'describe notedetails/09876543210987654321098765'
describe validation¶
Retrieves the validation details associated with an action and validation id
shipyard describe validation
<validation_id>
--action=<action_id>
Example:
shipyard describe validation 01BTG3PKBS15KCKFZ56XXXBGF2 \
--action=01BTG32JW87G0YKA1K29TKNAFX
- <validation_id>
- The id of the validation found in the describe action response.
- --action=<action_id>
- The action id that provides the context for this validation.
Sample¶
TBD
describe workflow¶
Retrieves the details for a workflow that is running or has run in the workflow engine.
shipyard describe workflow
<workflow_id>
Example:
shipyard describe workflow deploy_site__2017-01-01T12:34:56.123456
- <workflow_id>
- The id of the workflow found in the get workflows response.
Sample¶
$ shipyard describe workflow deploy_site__2017-11-27T20:34:33.000000
Workflow: deploy_site__2017-11-27T20:34:33.000000
State: failed
Dag ID: deploy_site
Execution Date: 2017-11-27 20:34:33
Start Date: 2017-11-27 20:34:33.979594
End Date: None
External Trigger: True
Steps State
action_xcom success
dag_concurrency_check success
deckhand_get_design_version failed
validate_site_design None
deckhand_get_design_version failed
deckhand_get_design_version failed
drydock_build None
Subworkflows:
Workflow: deploy_site.deckhand_get_design_version__2017-11-27T20:34:33.000000
State: failed
Dag ID: deploy_site.deckhand_get_design_version
Execution Date: 2017-11-27 20:34:33
Start Date: 2017-11-27 20:35:06.281825
End Date: None
External Trigger: False
Workflow: deploy_site.deckhand_get_design_version.deckhand_get_design_version__2017-11-27T20:34:33.000000
State: failed
Dag ID: deploy_site.deckhand_get_design_version.deckhand_get_design_version
Execution Date: 2017-11-27 20:34:33
Start Date: 2017-11-27 20:35:20.725506
End Date: None
External Trigger: False
Get Commands¶
get actions¶
Lists the actions that have been invoked.
shipyard get actions
Sample¶
$ shipyard get actions
Name Action Lifecycle Execution Time Step Succ/Fail/Oth Footnotes
deploy_site action/01BTP9T2WCE1PAJR2DWYXG805V Failed 2017-09-23T02:42:12 12/1/3 (1)
update_site action/01BZZKMW60DV2CJZ858QZ93HRS Processing 2017-09-23T04:12:21 6/0/10 (2)
Action Footnotes Note
(1) > action metadata:01BTP9T2WCE1PAJR2DWYXG805V(2017-09-23 02:42:23.346534): Invoked with revision 3
(2) > action metadata:01BZZKMW60DV2CJZ858QZ93HRS(2017-09-23 04:12:31.465342): Invoked with revision 4
get configdocs¶
Retrieve documents loaded into Shipyard. The possible options include last
committed, last site action, last successful site action and retrieval from
the Shipyard Buffer. Site actions include deploy_site
, update_site
and
update_software
. Note that only one option may be selected when retrieving
the documents for a particular collection.
The command will compare the differences between the revisions specified if the collection option is not specified. Note that we can only compare between 2 revisions. The relevant Deckhand revision id will be shown in the output as well.
If both collection and revisions are not specified, the output will show the differences between the 'committed' and 'buffer' revision (default behavior).
shipyard get configdocs
[--collection=<collection>]
[--committed | --last-site-action | --successful-site-action | --buffer]
[--cleartext-secrets]
Example:
shipyard get configdocs --collection=design
shipyard get configdocs --collection=design --last-site-action
shipyard get configdocs
shipyard get configdocs --committed --last-site-action
- --collection=<collection>
- The collection to retrieve for viewing. If no collection is entered, the status of the collections in the buffer and those that are committed will be displayed.
- --committed
- Retrieve the documents that have last been committed for this collection
- --last-site-action
- Retrieve the documents associated with the last successful or failed site action for this collection
- --successful-site-action
- Retrieve the documents associated with the last successful site action for this collection
- --buffer
- Retrive the documents that have been loaded into Shipyard since the prior commit. If no documents have been loaded into the buffer for this collection, this will return an empty response (default)
- --cleartext-secrets
- Returns secrets as cleartext for encrypted documents if the user has the appropriate permissions in the target environment. If the user does not have the appropriate permissions and sets this flag to true an error is returned. Only impacts returned documents, not lists of documents.
Sample¶
$ shipyard get configdocs
Comparing Base: committed (Deckhand revision 2)
to New: buffer (Deckhand revision 3)
Collection Base New
coll1 present unmodified
coll2 not present created
$ shipyard get configdocs --committed --last-site-action
Comparing Base: last_site_action (Deckhand revision 2)
to New: committed (Deckhand revision 2)
Collection Base New
secrets present unmodified
design present unmodified
$ shipyard get configdocs --collection=coll1
data:
chart_groups: [kubernetes-proxy, container-networking, dns, kubernetes, kubernetes-rbac]
release_prefix: ucp
id: 1
metadata:
layeringDefinition: {abstract: false, layer: site}
name: cluster-bootstrap-1
schema: metadata/Document/v1.0
storagePolicy: cleartext
schema: armada/Manifest/v1.0
status: {bucket: coll1, revision: 1}
get renderedconfigdocs¶
Retrieve the rendered version of documents loaded into Shipyard. Rendered documents are the "final" version of the documents after applying Deckhand layering and substitution.
shipyard get renderedconfigdocs
[--committed | --last-site-action | --successful-site-action | --buffer]
[--cleartext-secrets]
Example:
shipyard get renderedconfigdocs
- --committed
- Retrieve the documents that have last been committed.
- --last-site-action
- Retrieve the documents associated with the last successful or failed site action.
- --successful-site-action
- Retrieve the documents associated with the last successful site action.
- --buffer
- Retrieve the documents that have been loaded into Shipyard since the prior commit. (default)
- --cleartext-secrets
- Returns secrets as cleartext for encrypted documents if the user has the appropriate permissions in the target environment. If the user does not have the appropriate permissions and sets this flag to true an error is returned.
Sample¶
$ shipyard get renderedconfigdocs
data:
chart_groups: [kubernetes-proxy, container-networking, dns, kubernetes, kubernetes-rbac]
release_prefix: ucp
id: 1
metadata:
layeringDefinition: {abstract: false, layer: site}
name: cluster-bootstrap-1
schema: metadata/Document/v1.0
storagePolicy: cleartext
schema: armada/Manifest/v1.0
status: {bucket: coll1, revision: 1}
get workflows¶
Retrieve workflows that are running or have run in the workflow engine. This includes processes that may not have been started as an action (e.g. scheduled tasks).
shipyard get workflows
[--since=<date>]
Example:
shipyard get workflows
shipyard get workflows --since=2017-01-01T12:34:56.123456
- --since=<date>
- The historical cutoff date to limit the results of this response.
Sample¶
$ shipyard get workflows
Workflows State
deploy_site__2017-11-27T20:34:33.000000 failed
update_site__2017-11-27T20:45:47.000000 running
get site-statuses¶
Retrieve the provisioning status of nodes and/or power states of the baremetal machines in the site. If no option provided, retrieve records for both status types.
shipyard get site-statuses
[--status-type=<status-type>] (repeatable)
|
Example:
shipyard get site-statuses
shipyard get site-statuses --status-type=nodes-provision-status
shipyard get site-statuses --status-type=machines-power-state
shipyard get site-statuses --status-type=nodes-provision-status --status-type=machines-power-state
- --status-type=<status-type>
- Retrieve provisioning statuses of all nodes for status-type "nodes-provision-status" and retrieve power states of all baremetal machines in the site for status-type "machines-power-state".
Sample¶
$ shipyard get site-statuses
Nodes Provision Status:
Hostname Status
abc.xyz.com Ready
def.xyz.com Deploying
Machines Power State:
Hostname Power State
abc.xyz.com On
def.xyz.com On
$ shipyard get site-statuses --status-type=nodes-provision-status
Nodes Provision Status:
Hostname Status
abc.xyz.com Ready
def.xyz.com Deploying
$ shipyard get site-statuses --status-type=nodes-power-state
Machines Power State:
Hostname Power State
abc.xyz.com On
def.xyz.com On
$ shipyard get site-statuses --status-type=nodes-provision-status --status-type=nodes-power-state
Nodes Provision Status:
Hostname Status
abc.xyz.com Ready
def.xyz.com Deploying
Machines Power State:
Hostname Power State
abc.xyz.com On
def.xyz.com On
Logs Commands¶
logs¶
Retrieves the logs of the supplied namespaced item
shipyard logs
<namespaced_item>
Example:
shipyard logs step/01BTG32JW87G0YKA1K29TKNAFX/drydock_validate_site_design
Equivalent to:
shipyard logs step drydock_validate_site_design --action=01BTG32JW87G0YKA1K29TKNAFX
shipyard logs step/01BTG32JW87G0YKA1K29TKNAFX/drydock_validate_site_design/2
Equivalent to:
shipyard logs step drydock_validate_site_design --action=01BTG32JW87G0YKA1K29TKNAFX --try=2
logs step¶
Retrieves the logs for a particular workflow step. Note that 'try' is an optional parameter.
shipyard logs step
<step_id> --action=<action_name> [--try=<try>]
Example:
shipyard logs step drydock_validate_site_design --action=01BTG32JW87G0YKA1K29TKNAFX
shipyard logs step drydock_validate_site_design --action=01BTG32JW87G0YKA1K29TKNAFX --try=2
Sample¶
$ shipyard logs step/01C9VVQSCFS7V9QB5GBS3WFVSE/action_xcom
[2018-04-11 07:30:41,945] {{cli.py:374}} INFO - Running on host airflow-worker-0.airflow-worker-discovery.ucp.svc.cluster.local
[2018-04-11 07:30:41,991] {{models.py:1197}} INFO - Dependencies all met for <TaskInstance: deploy_site.action_xcom 2018-04-11 07:30:37 [queued]>
[2018-04-11 07:30:42,001] {{models.py:1197}} INFO - Dependencies all met for <TaskInstance: deploy_site.action_xcom 2018-04-11 07:30:37 [queued]>
[2018-04-11 07:30:42,001] {{models.py:1407}} INFO -
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------
[2018-04-11 07:30:42,022] {{models.py:1428}} INFO - Executing <Task(PythonOperator): action_xcom> on 2018-04-11 07:30:37
[2018-04-11 07:30:42,023] {{base_task_runner.py:115}} INFO - Running: ['bash', '-c', 'airflow run deploy_site action_xcom 2018-04-11T07:30:37 --job_id 2 --raw -sd DAGS_FOLDER/deploy_site.py']
[2018-04-11 07:30:42,606] {{base_task_runner.py:98}} INFO - Subtask: [2018-04-11 07:30:42,606] {{driver.py:120}} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/Grammar.txt
[2018-04-11 07:30:42,635] {{base_task_runner.py:98}} INFO - Subtask: [2018-04-11 07:30:42,634] {{driver.py:120}} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/PatternGrammar.txt
[2018-04-11 07:30:43,515] {{base_task_runner.py:98}} INFO - Subtask: [2018-04-11 07:30:43,515] {{configuration.py:206}} WARNING - section/key [celery/celery_ssl_active] not found in config
[2018-04-11 07:30:43,516] {{base_task_runner.py:98}} INFO - Subtask: [2018-04-11 07:30:43,515] {{default_celery.py:41}} WARNING - Celery Executor will run without SSL
[2018-04-11 07:30:43,517] {{base_task_runner.py:98}} INFO - Subtask: [2018-04-11 07:30:43,516] {{__init__.py:45}} INFO - Using executor CeleryExecutor
[2018-04-11 07:30:43,822] {{base_task_runner.py:98}} INFO - Subtask: [2018-04-11 07:30:43,821] {{models.py:189}} INFO - Filling up the DagBag from /usr/local/airflow/dags/deploy_site.py
[2018-04-11 07:30:43,892] {{cli.py:374}} INFO - Running on host airflow-worker-0.airflow-worker-discovery.ucp.svc.cluster.local
[2018-04-11 07:30:43,945] {{base_task_runner.py:98}} INFO - Subtask: [2018-04-11 07:30:43,944] {{python_operator.py:90}} INFO - Done. Returned value was: None
[2018-04-11 07:30:43,992] {{base_task_runner.py:98}} INFO - Subtask: """)
Help Commands¶
help¶
Provides topical help for shipyard.
Note
--help will provide more specific command help.
shipyard help
[<topic>]
Example:
shipyard help configdocs
- <topic>
- The topic of the help to be displayed. If this parameter is not specified the list of available topics will be displayed.
Sample¶
$ shipyard help
THE SHIPYARD COMMAND
The base shipyard command supports options that determine cross-CLI behaviors.
FORMAT
shipyard [--context-marker=<uuid>] [--os_{various}=<value>]
[--debug/--no-debug] [--output-format] <subcommands>
Please Note: --os_auth_url is required for every command except shipyard help
<topic>.
TOPICS
For information of the following topics, run shipyard help <topic>
actions
configdocs
Site Definition Documents¶
Shipyard requires some documents to be loaded as part of the site definition for the deploy_site and update_site as well as other workflows that directly deal with site deployments.
Schemas¶
- DeploymentConfiguration schema
- DeploymentStrategy schema
Deployment Configuration¶
Allows for specification of configurable options used by the site deployment related workflows, including the timeouts used for various steps, and the name of the Armada manifest that will be used during the deployment/update.
A sample deployment-configuration shows a completely specified example.
Note that the name and schema Shipyard expects the deployment configuration document to have is configurable via the document_info section in the Shipyard configuration, but should be left defaulted in most cases.
Default configuration values are provided for most values.
Supported values¶
- Section: physical_provisioner:
Values in the physical_provisioner section apply to the interactions with Drydock in the various steps taken to deploy or update bare-metal servers and networking.
- deployment_strategy
- The name of the deployment strategy document to be used. There is a default deployment strategy that is used if this field is not present.
- deploy_interval
- The seconds delayed between checks for progress of the step that performs deployment of servers.
- deploy_timeout
- The maximum seconds allowed for the step that performs deployment of all servers.
- destroy_interval
- The seconds delayed between checks for progress of destroying hardware nodes.
- destroy_timeout
- The maximum seconds allowed for destroying hardware nodes.
- join_wait
- The number of seconds allowed for a node to join the Kubernetes cluster.
- prepare_node_interval
- The seconds delayed between checks for progress of preparing nodes.
- prepare_node_timeout
- The maximum seconds allowed for preparing nodes.
- prepare_site_interval
- The seconds delayed between checks for progress of preparing the site.
- prepare_site_timeout
- The maximum seconds allowed for preparing the site.
- verify_interval
- The seconds delayed between checks for progress of verification.
- verify_timeout
- The maximum seconds allowed for verification by Drydock.
- Section: kubernetes_provisioner:
Values in the kubernetes_provisioner section apply to interactions with Promenade in the various steps of redeploying servers.
- drain_timeout
- The maximum seconds allowed for draining a node.
- drain_grace_period
- The seconds provided to Promenade as a grace period for pods to cease.
- clear_labels_timeout
- The maximum seconds provided to Promenade to clear labels on a node.
- remove_etcd_timeout
- The maximum seconds provided to Promenade to allow for removing etcd from a node.
- etcd_ready_timeout
- The maximum seconds allowed for etcd to reach a healthy state after a node is removed.
- Section: armada:
The Armada section provides configuration for the workflow interactions with Armada.
- manifest
- The name of the Armada manifest document that the workflow will use during site deployment activities. e.g.:'full-site'
Deployment Strategy¶
The deployment strategy document is optionally specified in the Deployment Configuration and provides a way to group, sequence, and test the deployments of groups of hosts deployed using Drydock. A sample deployment-strategy shows one possible strategy, in the context of the Shipyard unit testing.
Using A Deployment Strategy¶
Defining a deployment strategy involves understanding the design of a site, and the desired criticality of the nodes that make up the site.
A typical site may include a handful or many servers that participate in a Kubernetes cluster. Several of the servers may serve as control nodes, while others will handle the workload of the site. During the deployment of a site, it may be critically important that some servers are operational, while others may have a higher tolerance for misconfigured or failed nodes.
The deployment strategy provides a mechanism to handle defining groups of nodes such that the criticality is reflected in the success criteria.
The name of the DeploymentStrategy document to use is defined in the
Deployment Configuration, in the
physical_provisioner.deployment_strategy
field. The most simple deployment
strategy is used if one is not specified in the Deployment Configuration
document for the site. Example:
schema: shipyard/DeploymentStrategy/v1
metadata:
schema: metadata/Document/v1
name: deployment-strategy
layeringDefinition:
abstract: false
layer: global
storagePolicy: cleartext
data:
groups: [
- name: default
critical: true
depends_on: []
selectors: [
- node_names: []
node_labels: []
node_tags: []
rack_names: []
]
success_criteria:
percent_successful_nodes: 100
]
- This default configuration indicates that there are no selectors, meaning that all nodes in the design are included.
- The criticality is set to
true
meaning that the workflow will halt if the success criteria are not met. - The success criteria indicates that all nodes must be succssful to consider the group a success.
Note that the schema Shipyard expects the deployment strategy document to have is configurable via the document_info section in the Shipyard configuration, but should be left defaulted in most cases.
In short, the default behavior is to deploy everything all at once, and halt if there are any failures.
In a large deployment, this could be a problematic strategy as the chance of success in one try goes down as complexity rises. A deployment strategy provides a means to mitigate the unforeseen.
To define a deployment strategy, an example may be helpful, but first definition of the fields follow:
Groups¶
Groups are named sets of nodes that will be deployed together. The fields of a group are:
- name
- Required. The identifying name of the group.
- critical
- Required. Indicates if this group is required to continue to additional phases of deployment.
- depends_on
- Required, may be an empty list. Group names that must be successful before this group can be processed.
- selectors
- Required, may be an empty list. A list of identifying information to indicate the nodes that are members of this group.
- success_criteria
- Optional. Criteria that must evaluate to be true before a group is considered successfully complete with a phase of deployment.
Criticality¶
- Field: critical
- Valid values: true | false
Each group is required to indicate true or false for the critical field. This drives the behavior after the deployment of baremetal nodes. If any groups that are marked as critical: true fail to meet that group's success criteria, the workflow will halt after the deployment of baremetal nodes. A group that cannot be processed due to a parent dependency failing will be considered failed, regardless of the success criteria.
Dependencies¶
- Field: depends_on
- Valid values: [] or a list of group names
Each group specifies a list of depends_on groups, or an empty list. All identified groups must complete successfully for the phase of deployment before the current group is allowed to be processed by the current phase.
- A failure (based on success criteria) of a group prevents any groups dependent upon the failed group from being attempted.
- Circular dependencies will be rejected as invalid during document validation.
- There is no guarantee of ordering among groups that have their dependencies met. Any group that is ready for deployment based on declared dependencies will execute, however execution of groups is serialized - two groups will not deploy at the same time.
Selectors¶
- Field: selectors
- Valid values: [] or a list of selectors
The list of selectors indicate the nodes that will be included in a group. Each selector has four available filtering values: node_names, node_tags, node_labels, and rack_names. Each selector is an intersection of this critera, while the list of selectors is a union of the individual selectors.
- Omitting a criterion from a selector, or using empty list means that criterion is ignored.
- Having a completely empty list of selectors, or a selector that has no criteria specified indicates ALL nodes.
- A collection of selectors that results in no nodes being identified will be processed as if 100% of nodes successfully deployed (avoiding division by zero), but would fail the minimum or maximum nodes criteria (still counts as 0 nodes)
- There is no validation against the same node being in multiple groups, however the workflow will not resubmit nodes that have already completed or failed in this deployment to Drydock twice, since it keeps track of each node uniquely. The success or failure of those nodes excluded from submission to Drydock will still be used for the success criteria calculation.
E.g.:
selectors:
- node_names:
- node01
- node02
rack_names:
- rack01
node_tags:
- control
- node_names:
- node04
node_labels:
- ucp_control_plane: enabled
Will indicate (not really SQL, just for illustration):
SELECT nodes
WHERE node_name in ('node01', 'node02')
AND rack_name in ('rack01')
AND node_tags in ('control')
UNION
SELECT nodes
WHERE node_name in ('node04')
AND node_label in ('ucp_control_plane: enabled')
Success Criteria¶
- Field: success_criteria
- Valid values: for possible values, see below
Each group optionally contains success criteria which is used to indicate if the deployment of that group is successful. The values that may be specified:
- percent_successful_nodes
The calculated success rate of nodes completing the deployment phase.
E.g.: 75 would mean that 3 of 4 nodes must complete the phase successfully.
This is useful for groups that have larger numbers of nodes, and do not have critical minimums or are not sensitive to an arbitrary number of nodes not working.
- minimum_successful_nodes
- An integer indicating how many nodes must complete the phase to be considered successful.
- maximum_failed_nodes
- An integer indicating a number of nodes that are allowed to have failed the deployment phase and still consider that group successful.
When no criteria are specified, it means that no checks are done - processing continues as if nothing is wrong.
When more than one criterion is specified, each is evaluated separately - if any fail, the group is considered failed.
Example Deployment Strategy Document¶
This example shows a contrived deployment strategy with 5 groups: control-nodes, compute-nodes-1, compute-nodes-2, monitoring-nodes, and ntp-node.
---
schema: shipyard/DeploymentStrategy/v1
metadata:
schema: metadata/Document/v1
name: deployment-strategy
layeringDefinition:
abstract: false
layer: global
storagePolicy: cleartext
data:
groups:
- name: control-nodes
critical: true
depends_on:
- ntp-node
selectors:
- node_names: []
node_labels: []
node_tags:
- control
rack_names:
- rack03
success_criteria:
percent_successful_nodes: 90
minimum_successful_nodes: 3
maximum_failed_nodes: 1
- name: compute-nodes-1
critical: false
depends_on:
- control-nodes
selectors:
- node_names: []
node_labels: []
rack_names:
- rack01
node_tags:
- compute
success_criteria:
percent_successful_nodes: 50
- name: compute-nodes-2
critical: false
depends_on:
- control-nodes
selectors:
- node_names: []
node_labels: []
rack_names:
- rack02
node_tags:
- compute
success_criteria:
percent_successful_nodes: 50
- name: monitoring-nodes
critical: false
depends_on: []
selectors:
- node_names: []
node_labels: []
node_tags:
- monitoring
rack_names:
- rack03
- rack02
- rack01
- name: ntp-node
critical: true
depends_on: []
selectors:
- node_names:
- ntp01
node_labels: []
node_tags: []
rack_names: []
success_criteria:
minimum_successful_nodes: 1
The ordering of groups, as defined by the dependencies (depends-on
fields):
__________ __________________
| ntp-node | | monitoring-nodes |
---------- ------------------
|
____V__________
| control-nodes |
---------------
|_________________________
| |
______V__________ ______V__________
| compute-nodes-1 | | compute-nodes-2 |
----------------- -----------------
Given this, the order of execution could be any of the following:
- ntp-node > monitoring-nodes > control-nodes > compute-nodes-1 > compute-nodes-2
- ntp-node > control-nodes > compute-nodes-2 > compute-nodes-1 > monitoring-nodes
- monitoring-nodes > ntp-node > control-nodes > compute-nodes-1 > compute-nodes-2
- and many more ... the only guarantee is that ntp-node will run some time before control-nodes, which will run sometime before both of the compute-nodes. Monitoring-nodes can run at any time.
Also of note are the various combinations of selectors and the varied use of success criteria.
Example Processing¶
Using the defined deployment strategy in the above example, the following is an example of how it may process:
Start
|
| prepare ntp-node <SUCCESS>
| deploy ntp-node <SUCCESS>
V
| prepare control-nodes <SUCCESS>
| deploy control-nodes <SUCCESS>
V
| prepare monitoring-nodes <SUCCESS>
| deploy monitoring-nodes <SUCCESS>
V
| prepare compute-nodes-2 <SUCCESS>
| deploy compute-nodes-2 <SUCCESS>
V
| prepare compute-nodes-1 <SUCCESS>
| deploy compute-nodes-1 <SUCCESS>
|
Finish (success)
If there were a failure in preparing the ntp-node, the following would be the result:
Start
|
| prepare ntp-node <FAILED>
| deploy ntp-node <FAILED, due to prepare failure>
V
| prepare control-nodes <FAILED, due to dependency>
| deploy control-nodes <FAILED, due to dependency>
V
| prepare monitoring-nodes <SUCCESS>
| deploy monitoring-nodes <SUCCESS>
V
| prepare compute-nodes-2 <FAILED, due to dependency>
| deploy compute-nodes-2 <FAILED, due to dependency>
V
| prepare compute-nodes-1 <FAILED, due to dependency>
| deploy compute-nodes-1 <FAILED, due to dependency>
|
Finish (failed due to critical group failed)
If a failure occurred during the deploy of compute-nodes-2, the following would result:
Start
|
| prepare ntp-node <SUCCESS>
| deploy ntp-node <SUCCESS>
V
| prepare control-nodes <SUCCESS>
| deploy control-nodes <SUCCESS>
V
| prepare monitoring-nodes <SUCCESS>
| deploy monitoring-nodes <SUCCESS>
V
| prepare compute-nodes-2 <SUCCESS>
| deploy compute-nodes-2 <FAILED, non critical group>
V
| prepare compute-nodes-1 <SUCCESS>
| deploy compute-nodes-1 <SUCCESS>
|
Finish (success with some nodes/groups failed)
Important Points¶
- By default, the deployment strategy is all-at-once, requiring total success.
- Critical group failures halt the deployment activity AFTER processing all nodes, but before proceeding to deployment of the software using Armada.
- Success Criteria are evaluated at the end of processing of each of two phases for each group. A failure in a parent group indicates a failure for child groups - those children will not be processed.
- Group processing is serial.
Interactions¶
During the processing of nodes, the workflow interacts with Drydock using the node filter mechanism provided in the Drydock API. When formulating the nodes to process in a group, Shipyard will make an inquiry of Drydock's /nodefilter endpoint to get the list of nodes that match the selectors for the group.
Shipyard will keep track of nodes that are actionable for each group using the response from Drydock, as well as prior group inquiries. This means that any nodes processed in a group will not be reprocessed in a later group, but will still count toward that group's success criteria.
Two actions (prepare, deploy) will be invoked against Drydock during the actual node preparation and deployment. The workflow will monitor the tasks created by Drydock and keep track of the successes and failures.
At the end of processing, the workflow step will report the success status for each group and each node. Processing will either stop or continue depending on the success of critical groups.
Example beginning of group processing output from a workflow step:
INFO Setting group control-nodes with None -> Stage.NOT_STARTED
INFO Group control-nodes selectors have resolved to nodes: node2, node1
INFO Setting group compute-nodes-1 with None -> Stage.NOT_STARTED
INFO Group compute-nodes-1 selectors have resolved to nodes: node5, node4
INFO Setting group compute-nodes-2 with None -> Stage.NOT_STARTED
INFO Group compute-nodes-2 selectors have resolved to nodes: node7, node8
INFO Setting group spare-compute-nodes with None -> Stage.NOT_STARTED
INFO Group spare-compute-nodes selectors have resolved to nodes: node11, node10
INFO Setting group all-compute-nodes with None -> Stage.NOT_STARTED
INFO Group all-compute-nodes selectors have resolved to nodes: node11, node7, node4, node8, node10, node5
INFO Setting group monitoring-nodes with None -> Stage.NOT_STARTED
INFO Group monitoring-nodes selectors have resolved to nodes: node12, node6, node9
INFO Setting group ntp-node with None -> Stage.NOT_STARTED
INFO Group ntp-node selectors have resolved to nodes: node3
INFO There are no cycles detected in the graph
Of note is the resolution of groups to a list of nodes. Notice that the nodes in all-compute-nodes node11 overlap the nodes listed as part of other groups. When processing, if all the groups were to be processed before all-compute-nodes, there would be no remaining nodes that are actionable when the workflow tries to process all-compute-nodes. The all-compute-nodes groups would then be evaluated for success criteria immediately against those nodes processed prior. E.g.:
INFO There were no actionable nodes for group all-compute-nodes. It is possible that all nodes: [node11, node7, node4, node8, node10, node5] have previously been deployed. Group will be immediately checked against its success criteria
INFO Assessing success criteria for group all-compute-nodes
INFO Group all-compute-nodes success criteria passed
INFO Setting group all-compute-nodes with Stage.NOT_STARTED -> Stage.PREPARED
INFO Group all-compute-nodes has met its success criteria and is now set to stage Stage.PREPARED
INFO Assessing success criteria for group all-compute-nodes
INFO Group all-compute-nodes success criteria passed
INFO Setting group all-compute-nodes with Stage.PREPARED -> Stage.DEPLOYED
INFO Group all-compute-nodes has met its success criteria and is successfully deployed (Stage.DEPLOYED)
Example summary output from workflow step doing node processing:
INFO ===== Group Summary =====
INFO Group monitoring-nodes ended with stage: Stage.DEPLOYED
INFO Group ntp-node [Critical] ended with stage: Stage.DEPLOYED
INFO Group control-nodes [Critical] ended with stage: Stage.DEPLOYED
INFO Group compute-nodes-1 ended with stage: Stage.DEPLOYED
INFO Group compute-nodes-2 ended with stage: Stage.DEPLOYED
INFO Group spare-compute-nodes ended with stage: Stage.DEPLOYED
INFO Group all-compute-nodes ended with stage: Stage.DEPLOYED
INFO ===== End Group Summary =====
INFO ===== Node Summary =====
INFO Nodes Stage.NOT_STARTED:
INFO Nodes Stage.PREPARED:
INFO Nodes Stage.DEPLOYED: node11, node7, node3, node4, node2, node1, node12, node8, node9, node6, node10, node5
INFO Nodes Stage.FAILED:
INFO ===== End Node Summary =====
INFO All critical groups have met their success criteria
Overall success or failure of workflow step processing based on critical groups meeting or failing their success criteria will be reflected in the same fashion as any other workflow step output from Shipyard.
An Example of CLI describe action command output, with failed processing:
$ shipyard describe action/01BZZK07NF04XPC5F4SCTHNPKN
Name: deploy_site
Action: action/01BZZK07NF04XPC5F4SCTHNPKN
Lifecycle: Failed
Parameters: {}
Datetime: 2017-11-27 20:34:24.610604+00:00
Dag Status: failed
Context Marker: 71d4112e-8b6d-44e8-9617-d9587231ffba
User: shipyard
Steps Index State
step/01BZZK07NF04XPC5F4SCTHNPKN/dag_concurrency_check 1 success
step/01BZZK07NF04XPC5F4SCTHNPKN/validate_site_design 2 success
step/01BZZK07NF04XPC5F4SCTHNPKN/drydock_build 3 failed
step/01BZZK07NF04XPC5F4SCTHNPKN/armada_build 4 None
step/01BZZK07NF04XPC5F4SCTHNPKN/drydock_prepare_site 5 success
step/01BZZK07NF04XPC5F4SCTHNPKN/drydock_nodes 6 failed
Deployment Version¶
A deployment version document is a Pegleg-generated document that captures information about the repositories used to generate the site defintion. The presence of this document is optional by default, but Shipyard can be configured to ensure this document exists, and issue a warning or error if it is absent from a configdocs collection. Document example:
---
schema: pegleg/DeploymentData/v1
metadata:
schema: metadata/Document/v1
name: deployment-version
layeringDefinition:
abstract: false
layer: global
storagePolicy: cleartext
data:
documents:
site-repository:
commit: 37260deff6a213e30897fc284a993c791336a99d
tag: master
dirty: false
repository-of-secrets:
commit: 23e7265aee4843301807d649036f8e860fda0cda
tag: master
dirty: false
Currently, Shipyard does not use this document for anything. Use of this document's data will be added to a future version of Shipyard/Airship.
Note, the name and schema Shipyard expects this document to have can be configured via the document_info section in the Shipyard configuration.
Shipyard Client User's Guide¶
- Shipyard provides three methods of interaction:
- An API - Shipyard API
- An API Client - api-client
- A Command Line Interface (CLI) - Shipyard CLI
Each of these components utilizes the prior.
_________Client_________ __Server__
| | | |
| CLI -uses-> API Client | -calls-> | API |
|________________________| |__________|
This guide focuses on interaction with Shipyard via the CLI.
CLI Invocation Flow¶
It is useful to understand the flow of a request made using the Shipyard CLI first. There are several steps that occur with each invocation. This example will demonstrate the flow of an invocation of the Shipyard CLI.
Step 1: Invocation
:
User --> CLI
e.g.:
$ shipyard get actions
As noted in CLI documentation, Shipyard handles authentication by leveraging OpenStack's Keystone identity service. The CLI provides command line options to specify credentials, or extracts them from the environment. For the example started above, since the credentials are not specified, they would need to be set in the environment prior to invocation. The credentials, regardless of source, are passed from the CLI software to the API Client software.
Step 2: API Client secures an authentication token
:
API Client --> Keystone authentication
/
(Auth Token) <---------
Shipyard API Client calls Keystone to acquire an authentication token.
Step 3: API Client discovers Shipyard
:
API Client --> Keystone service discovery
/
(Shipyard URL) <-------
Shipyard API Client calls Keystone to find the URL for the Shipyard API.
Step 4: API Client invokes the appropriate Shipyard API
:
API Client --> Shipyard API <--> Database, Airflow, etc...
/
(JSON response) <------
As noted in the CLI documentation, some responses are YAML instead of JSON.
Step 5: CLI formats response
:
User <-- (Formatted Response) <-- CLI <-- (JSON response)
e.g.:
Name Action Lifecycle
deploy_site action/01BZZK07NF04XPC5F4SCTHNPKN Failed
update_site action/01BZZKMW60DV2CJZ858QZ93HRS Processing
The CLI maps the JSON response from the Shipyard API into a more tabular format and presents it to the user.
Setup¶
Server Components¶
Use of the Shipyard client requires that a working installation of the Shipyard API is available. See Deployment Guide
Local Environment¶
Several setup items may be required to allow for an operational Shipyard CLI, including several work-arounds depending on how the Shipyard API is deployed.
- Prerequisites:
- Python 3.5+
- git
Note
It is recommended that a virtual environment setup with Python 3.5 is used to contain the dependencies and installation of the Shipyard client.
Retrieve Shipyard client software from git
:
git clone --depth=1 https://github.com/openstack/airship-shipyard.git
Install requirements
:
sudo apt install python3-pip -y
sudo pip3 install --upgrade pip
cd shipyard
pip3 install -r requirements.txt
Build/install Shipyard
:
python3 setup.py install
At this point, invoking shipyard as a command should result in a basic help response:
$ shipyard
Usage: shipyard [OPTIONS] COMMAND [ARGS]...
COMMAND: shipyard
DESCRIPTION: The base shipyard command supports options that determine
...
Setup environment variables
:
export OS_USER_DOMAIN_NAME=default
export OS_PROJECT_DOMAIN_NAME=default
export OS_PROJECT_NAME=service
export OS_USERNAME=shipyard
export OS_PASSWORD=password
export OS_AUTH_URL=http://keystone.ucp:80/v3
- The values of these variables should match the credentials and endpoint of the target Shipyard API/Keystone environment.
- The
shipyard
andpassword
values are the insecure values used by default if not overridden by the installation of Shipyard. - The dev_minimal manifest deployment from Airship-in-a-bottle referenced in the deployment guide provides a set of credentials that can be used.
Configure hosts file, if necessary
:
Add to /etc/hosts:
10.96.0.44 keystone.ucp
10.96.0.44 shipyard-api.ucp.svc.cluster.local
- These values would need to be set in the case where DNS resolution of the Keystone and Shipyard URLs is not available.
- The IP addresses should be set to resolve to the IP address of the ingress controller for the target Shipyard API/Keystone environment.
- The value listed as
keystone.ucp
needs to match the value set for OS_AUTH_URL. - The value listed as
shipyard-api.ucp.svc.cluster.local
needs to match the value that Keystone returns when service lookup is done for the public URL for Shipyard.
Running Shipyard CLI with Docker Container¶
It is also possible to execute Shipyard CLI using a docker container.
Note that we will need to pass the relevant environment information as well
as the Shipyard command that we wish to execute as part of the docker run
command. In this example we will execute the get actions
command:
sudo docker run -e 'OS_AUTH_URL=http://keystone-api.ucp.svc.cluster.local:80/v3' \
-e 'OS_PASSWORD=password' -e 'OS_PROJECT_DOMAIN_NAME=default' \
-e 'OS_PROJECT_NAME=service' -e 'OS_USERNAME=shipyard' \
-e 'OS_USER_DOMAIN_NAME=default' -e 'OS_IDENTITY_API_VERSION=3' \
--rm --net=host airshipit/shipyard:latest-ubuntu_focal get actions
The output will resemble the following:
+ CMD=shipyard
+ PORT=9000
+ '[' get = server ']'
+ exec shipyard get actions
Name Action Lifecycle
deploy_site action/01C1Z4HQM8RFG823EQT3EAYE4X Processing
Use Case: Ingest Site Design¶
Shipyard serves as the entry point for a deployment of Airship. One can imagine the following activities representing part of the lifecycle of a group of servers for which Airship would serve as the control plane:
- Definition
- A group of servers making up a
site
has been identified. Designs covering the hardware, network, and software are assembled. - Preparation
The site is assembled, racking, and wiring is completed, and the hardware is readied for operation. The
Genesis Node
is preinstalled with an (Ubuntu 18.04) image. Airship is deployed; See Deployment GuideAt this point, Airship is ready for use. This is when the Shipyard API is available for use.
- Load Configuration Documents
- A user, deployment engineer, or automation -- i.e. the operator interacts
with Shipyard, perhaps by using the CLI. The operator loads
configdocs
which are a product of the definition step. Theseconfigdocs
are declarative set of YAML documents using a format compatible with Deckhand and containing information usable by the other Airship components.
The interaction with Shipyard could happen as follows:
$ git clone --depth=1 https://gitrepo.with.designs/site1.git
Note
Assume: /home/user/site1 now contains .yaml files with Drydock, Promenade, Armada, and Divingbell configurations, as well as secrets such as certificates, CAs, and passwords.
Note
Assume: the appropriate credentials are set in the environment
$ shipyard create configdocs site1 --directory=/home/user/site1
Configuration documents added.
Status: Validations succeeded
Reason: Validation
This loads the documents as a named collection "site1" into Deckhand as a bucket in a revision.
Note
Alternatively, the command could have loaded a single file using --filename=<file>.yaml instead of the --directory option
Following the creation of a configdocs collection in the Shipyard buffer, the configdocs must be committed before Shipyard will use those documents as part of an action:
$ shipyard commit configdocs
During this command, the other Airship components are contacted to validate the designs in Deckhand. If the validations are not successful, Shipyard will not mark the revision as committed.
Important
It is not necessary to load all configuration documents in one step, but each named collection may only exist as a complete set of documents (i.e. must be loaded together).
Important
Shipyard will prevent the loading of two collections into the buffer at the same time unless --append is utilized. This option allows for the loading of multiple collections into the buffer to be later committed together.
An example of this is a base collection that defines some common design elements, a secrets collection that contains certificates, and a site-specific collection that combines with the other two collections to fully define the site.
Use Case: Deploy Site¶
Continuing the lifecycle steps from the Ingest Site Design use case, the
operator
proceeds with the deployment of the site.
- Deployment
- The operator creates a deploy_site action and monitors its progress
- Maintenance
- The operator loads new or changed configuration documents (as above),
commits them, and creates an
update_site
action
The deployment interactions with Shipyard could happen as follows:
$ shipyard create action deploy_site
Name Action Lifecycle
deploy_site action/01BZZK07NF04XPC5F4SCTHNPKN None
The deploy_site action is issued to Shipyard which relays a command to the Airflow driven workflow processor. During and following execution of the action, the operator can query the status and results of the action:
$ shipyard get actions
Name Action Lifecycle
deploy_site action/01BZZK07NF04XPC5F4SCTHNPKN Processing
$ shipyard describe action/01BZZK07NF04XPC5F4SCTHNPKN
Name: deploy_site
Action: action/01BZZK07NF04XPC5F4SCTHNPKN
Lifecycle: Processing
Parameters: {}
Datetime: 2017-11-27 20:34:24.610604+00:00
Dag Status: running
Context Marker: 71d4112e-8b6d-44e8-9617-d9587231ffba
User: shipyard
Steps Index State
step/01BZZK07NF04XPC5F4SCTHNPKN/action_xcom 1 success
step/01BZZK07NF04XPC5F4SCTHNPKN/dag_concurrency_check 2 success
...
More information is returned than shown here - for sake of abbreviation. The process of maintenance (update_site) is very similar to the process of deploying a site.
Deployment Guide¶
Note
Shipyard is still under active development and this guide will evolve along the way
Deployment¶
The current deployment makes use of the airship-in-a-bottle project to set up the underlaying Kubernetes infrastructure, container networking (Calico), disk provisioner (Ceph or NFS), and Airship components that are used by Shipyard.
The dev_minimal manifest is the recommended manifest. Please see the README.txt that exists in that manifest's directory.
This approach sets up an 'All-In-One' Airship environment that allows developers to bring up Shipyard and the rest of the Airship components on a single Ubuntu Virtual Machine.
The deployment is fully automated and can take a while to complete. It can take 30 minutes to an hour or more for a full deployment to complete.
Post Deployment¶
The environment should include the following after executing the required steps:
# sudo kubectl get pods -n ucp | grep -v Completed NAME READY STATUS RESTARTS AGE airflow-scheduler-79754bfdd5-2wpxn 1/1 Running 0 4m airflow-web-7679866685-g99qm 1/1 Running 0 4m airflow-worker-0 3/3 Running 0 4m airship-ucp-keystone-memcached-mem... 1/1 Running 0 31m airship-ucp-rabbitmq-rabbitmq-0 1/1 Running 0 35m armada-api-5488cbdb99-zjb8n 1/1 Running 0 12m barbican-api-5fc8f7d6f-s7h7j 1/1 Running 0 11m deckhand-api-7b476d6c46-qlvtm 1/1 Running 0 8m drydock-api-5f9fdc858d-lnxvj 1/1 Running 0 1m ingress-6cd5b89d5d-hzfzj 1/1 Running 0 35m ingress-error-pages-5c97bb46bb-zqqbx 1/1 Running 0 35m keystone-api-7657986b8c-6bf92 1/1 Running 0 31m maas-ingress-66447d7445-mgklj 2/2 Running 0 27m maas-ingress-errors-8686d56d98-vrjzg 1/1 Running 0 27m maas-rack-0 1/1 Running 0 27m maas-region-0 2/2 Running 0 27m mariadb-ingress-6c4f9c76f-lk9ff 1/1 Running 0 35m mariadb-ingress-6c4f9c76f-ns5kj 1/1 Running 0 35m mariadb-ingress-error-pages-5dd6fb... 1/1 Running 0 35m mariadb-server-0 1/1 Running 0 35m postgresql-0 1/1 Running 0 32m promenade-api-764b765d77-ffhv4 1/1 Running 0 7m shipyard-api-69888d9f68-8ljfk 1/1 Running 0 4m
Sample Policy File¶
The following is a sample Shipyard policy file for adaptation and use. It is auto-generated from Shipyard when this documentation is built, so if you are having issues with an option, please compare your version of Shipyard with the version of this documentation.
The sample policy file can also be viewed in file form.
# Actions requiring admin authority
#"admin_required": "role:admin"
# Rule to deny all access. Used for default denial
#"deny_all": "!"
# List workflow actions invoked by users
# GET /api/v1.0/actions
#"workflow_orchestrator:list_actions": "rule:admin_required"
# Create a workflow action
# POST /api/v1.0/actions
#"workflow_orchestrator:create_action": "rule:admin_required"
# Retrieve an action by its id
# GET /api/v1.0/actions/{action_id}
#"workflow_orchestrator:get_action": "rule:admin_required"
# Retrieve an action step by its id
# GET /api/v1.0/actions/{action_id}/steps/{step_id}
#"workflow_orchestrator:get_action_step": "rule:admin_required"
# Retrieve logs of an action step by its id
# GET /api/v1.0/actions/{action_id}/steps/{step_id}/logs
#"workflow_orchestrator:get_action_step_logs": "rule:admin_required"
# Retrieve an action validation by its id
# GET /api/v1.0/actions/{action_id}/validations/{validation_id}
#"workflow_orchestrator:get_action_validation": "rule:admin_required"
# Send a control to an action
# POST /api/v1.0/actions/{action_id}/control/{control_verb}
#"workflow_orchestrator:invoke_action_control": "rule:admin_required"
# Retrieve the status of the configdocs
# GET /api/v1.0/configdocs
#"workflow_orchestrator:get_configdocs_status": "rule:admin_required"
# Ingest configuration documents for the site design
# POST /api/v1.0/configdocs/{collection_id}
#"workflow_orchestrator:create_configdocs": "rule:admin_required"
# Retrieve a collection of configuration documents with redacted
# secrets
# GET /api/v1.0/configdocs/{collection_id}
#"workflow_orchestrator:get_configdocs": "rule:admin_required"
# Retrieve a collection of configuration documents with cleartext
# secrets.
# GET /api/v1.0/configdocs/{collection_id}
#"workflow_orchestrator:get_configdocs_cleartext": "rule:admin_required"
# Move documents from the Shipyard buffer to the committed documents
# POST /api/v1.0/commitconfigdocs
#"workflow_orchestrator:commit_configdocs": "rule:admin_required"
# Retrieve the configuration documents rendered by Deckhand into a
# complete design
# GET /api/v1.0/renderedconfigdocs
#"workflow_orchestrator:get_renderedconfigdocs": "rule:admin_required"
# Retrieve the configuration documents with cleartext secrets rendered
# by Deckhand into a complete design
# GET /api/v1.0/renderedconfigdocs
#"workflow_orchestrator:get_renderedconfigdocs_cleartext": "rule:admin_required"
# Retrieve the list of workflows (DAGs) that have been invoked in
# Airflow, whether via Shipyard or scheduled
# GET /api/v1.0/workflows
#"workflow_orchestrator:list_workflows": "rule:admin_required"
# Retrieve the detailed information for a workflow (DAG) from Airflow
# GET /api/v1.0/workflows/{id}
#"workflow_orchestrator:get_workflow": "rule:admin_required"
# Retrieve the details for a note. Further authorization is required
# depending on the topic of the note
# GET /api/v1.0/notedetails/{note_id}
#"workflow_orchestrator:get_notedetails": "rule:admin_required"
# Retrieve the statuses for the site
# GET /api/v1.0/site_statuses
#"workflow_orchestrator:get_site_statuses": "rule:admin_required"
# Create a workflow action to deploy the site
# POST /api/v1.0/actions
#"workflow_orchestrator:action_deploy_site": "rule:admin_required"
# Create a workflow action to update the site
# POST /api/v1.0/actions
#"workflow_orchestrator:action_update_site": "rule:admin_required"
# Create a workflow action to update the site software
# POST /api/v1.0/actions
#"workflow_orchestrator:action_update_software": "rule:admin_required"
# Create a workflow action to redeploy target servers
# POST /api/v1.0/actions
#"workflow_orchestrator:action_redeploy_server": "rule:admin_required"
# Create a workflow action to relabel target nodes
# POST /api/v1.0/actions
#"workflow_orchestrator:action_relabel_nodes": "rule:admin_required"
# Create a workflow action to invoke Helm tests on all releases or a
# targeted release
# POST /api/v1.0/actions
#"workflow_orchestrator:action_test_site": "rule:admin_required"
Multiple Distro Support¶
This project builds images for Shipyard and Airflow components. Currently, it supports building images for ubuntu and opensuse ( leap 15.1 as base image).
By default, Ubuntu images are built and are published to public registry server. Recently support for publishing opensuse image has also been added.
If you need to build opensuse images locally, the following parameters can be passed to the make command in shipyard repository's root directory with images as target:
DISTRO: opensuse_15
DISTRO_BASE_IMAGE: "opensuse/leap:15.1"
DOCKER_REGISTRY: { your_docker_registry }
IMAGE_TAG: latest
IMAGE_NAME: airflow
PUSH_IMAGE: false
Following is an example in command format to build and publish images locally. Command is run in shipyard repository's root directory.
DISTRO=opensuse_15 DOCKER_REGISTRY={ your_docker_registry } IMAGE_NAME=airflow IMAGE_TAG=latest PUSH_IMAGE=true make images
Following parameters need to be passed as environment/shell variable to make command:
- DISTRO
- parameter to identify distro specific Dockerfile, ubuntu_focal (Default)
- DISTRO_BASE_IMAGE
- parameter to use different base image other than what's used in DISTRO specific Dockerfile (optional)
- DOCKER_REGISTRY
- parameter to specify local/internal docker registry if need to publish image (optional), quay.io (Default)
- IMAGE_TAG
- tag to be used for image built, untagged (Default)
- PUSH_IMAGE
- flag to indicate if images needs to be pushed to a docker registry, false (Default)
This work is done as per approved spec multi_distro_support. Currently only image building logic is enhanced to support multiple distro.
Adding New Distro Support¶
To add support for building images for a new distro, following steps can be followed.
- Shipyard uses images for shipyard and airflow. So to build images for those components, two Dockerfiles are required, one for each component.
- Add distro specific Dockerfile for each component which will have steps to include necessary packages and run environment configuration. Use existing Dockerfile as sample to identify needed packages and environment information.
- New dockerfile can be named as Dockefile.{DISTRO} where DISTRO is expected to be distro identifier which is passed to makefile.
- Respective dockerfile should be placed in {shipyard_root}/images/airflow and {shipyard_root}/images/shipyard
- Add check, gate, and post jobs for building, testing and publishing images. These entries need to be added in {shipyard_root}/.zuul.yaml file. Please refer to existing zuul file for better existing opensuse support.
- Add any relevant information to this document.
Building this Documentation¶
Use make docs
or tox -e docs
to generate these docs. This will and
build an html version of this documentation that can be viewed using a browser
at doc/build/index.html on the local filesystem.