REST API Design for Long-Running Tasks

A long-running task is an operation that requires a considerable amount of server resources and/or time. To avoid blocking the client, the task must be completed asynchronously without the persistent connection between the client and the server.

After submitting the task, the client needs to poll to a provided URL for the task execution progress. If there is WebSockets support, the client can also be notified via WebSockets when execution finishes, instead of polling.

Although Roy Fielding has not mentioned anything about the long-running tasks in his dissertation, we can combine the REST principles with other HTTP RFCs to design a viable solution.

1. General Approach for a Long-Running API

Traditionally, all the APIs that support long-running operations are built with the following approach:

Resource Creation and Initiation: Create a resource to represent the initiation of a long-running task. This resource could be a unique URI that clients can POST a request to in order to start a task. The request body may include parameters or data needed to define the task.

POST /api/tasks

Status: 202 Accepted
Location: /tasks/12345
Content-Type: application/json

{
  "taskId": 12345,
  "status": "pending",
  "createdAt": "2023-11-04T10:00:00Z"
}

Task Status Resource: Once the task has been submitted, create a resource to represent the current status of the long-running task. This resource allows clients to check the progress and outcome of the task. We may document the various possible statuses of the tasks so the client can act accordingly.

GET /api/tasks/12345

Status: 200 OK
Content-Type: application/json
{
  "taskId": 12345,
  "status": "in progress",
  "createdAt": "2023-11-04T10:00:00Z",
  "progress": {
    "percentage": 45,
    "currentStep": "data transformation"
  }
}

Result Reporting: Once the task is completed, this API can return a “Completed” status along URI of the detailed result or URI of the newly created resource, in any.

GET /api/tasks/12345

Status: 200 OK
Content-Type: application/json

{
  "taskId": 12345,
  "status": "completed",
  "createdAt": "2023-11-04T10:00:00Z",
  "completedAt": "2023-11-04T10:15:00Z",
  "result": {
    "newResource": "https://example.com/resources/123-456"
  }
}

2. HTTP Status Code from Task Submission

When a long-running task is submitted via a REST API, the choice of HTTP status codes varies depending on several factors.

Status Code	Description
202 Accepted	Indicates that the request has been accepted for processing, but the processing is not yet complete. The response may include a link to a resource or ‘Location‘ header where the client can check the status of the task.
303 See Other	Indicates that the client should look for the status of the long-running task at a different location (URI) mentioned in the ‘Location‘ header. This is useful when we want to redirect the client to a resource where the task status can be checked.

The most suitable status code to return from a long-running REST API is HTTP 202 Accepted. Once the request has been accepted, there is no facility for resending a status code from the submitted asynchronous operation.

According to RFC-9110 (obsoletes RFC-7231), the 202 (Accepted) status code indicates that the request has been accepted for processing, but the processing has not been completed.

The representation sent with 202 status should describe the request’s current status (generally it is submitted) and Location header to a status monitor that can provide the client with an estimate of when the request will be fulfilled.

3. API Response Structure

The response from a long-running API should contain only the necessary information about the current status of the submitted task. The following is an example of an API that accepts scripts to run on the devices.

HTTP POST: /device-management/script-execution/new

{
  "device-ids": [1, 2, 3],
  "script-url": "/temp/test-script.sh"
}

The response to the above request can be as follows where 123456789 is a random number denoting the id of the long-running task in progress. There can be multiple such tasks executing on the server, at any time.

HTTP Status 202
Location: /device-management/script-execution/123456789

{
  "device-ids": [1, 2, 3],
  "script-url": "/temp/test-script.sh",
  "status": "SUBMITTED"
}

4. Querying the Completion Status of Long-Running Task

After the task has been submitted, the client can poll to the URL provided in Location header and get the current status of the long-running task. A sample response body can be:

HTTP GET: /device-management/script-execution/123456789

{
  "device-ids": [1, 2, 3],
  "script-url": "/temp/test-script.sh",
  "status": "INPROGESS",
  "percentage": "45%",
}

The task completion status and percentage can change based on the execution progress. We can use other status constants and process indicators as well, depending on project requirements.

Once the task is finished, it can either provide the task execution result in the same response, or it can provide another URL that will return the task execution result.

HTTP GET: /device-management/script-execution/123456789

{
  "device-ids": [1, 2, 3],
  "script-url": "/temp/test-script.sh",
  "status": "COMPLETE",
  "percentage": "100%",
  "result": {
    "id": 123456789,
    "sys-log-location":"/log/….",
    "err-log-location":"/log/….",
    "success-on-devices": [1, 2],
    "failed-on-devices": [3]
  }
}

Alternatively, the execution status response can point to a new location for accessing the result.

HTTP GET: /device-management/script-execution/123456789

{
  "device-ids": [1, 2, 3],
  "script-url": "/temp/test-script.sh",
  "status": "COMPLETE",
  "percentage": "100%",
  "result": "/device-mamangement/devices/execute-scripts/123456789/result"
}

HTTP GET: /device-management/script-execution/123456789/result

{
  "id": 123456789,
  "sys-log-location":"/log/….",
  "err-log-location":"/log/….",
  "success-on-devices": [1, 2],
  "failed-on-devices": [3]
}

Also, consider using real-time messaging systems (such as Apache Kafka) for publishing the task status, which can notify the clients if they have subscribed to it. It generally depends on the type of client:

An API client can dynamically subscribe to the Topic URL in the Location header therefore we can use a message queue in communications between TWO API clients.
For communication between a browser and the server-hosted API, a simple REST-style API response will be more suitable.

5. Canceling an In-Progress Task

A task can be submitted by mistake so there must be a way to cancel such a task to prevent further damage to the system. The request can be canceled partially or fully. The changes by the tasks, until they are canceled, can be persistent or rollbacked. All these decisions depend on the application requirements and capabilities.

The cancel operation will be idempotent.

A client can send the HTTP DELETE request on the URL provided by Location header when the task is submitted. The URL contains the task-id/execution-id so it can be cancelled using it.

HTTP DELETE "/device-management/script-execution/123456789"

6. Best Practices

Do not wait for long-running tasks to complete as part of ordinary HTTP request processing.
Provide dedicated URLs to query the task status.
Provide a mechanism to cancel a long-running task.
The task execution process should not depend on the client in any way.
Consider using Retry-After header field in the API response to indicate how long the user agent ought to wait before retrying the same request if the previous request was not accepted for any reason.
Consider using RFC 7807 [Problem Details for HTTP APIs] specification when returning an error response.

7. Summary

Designing a RESTful API for long-running tasks is challenging because HTTP, the most-used underlying protocol for REST, is inherently stateless and request-response oriented which is not suitable for long-running operations. Still, we can divide the operation into two or more parts and create an API for each, thus effectively designing a REST API that accommodates long-running tasks.

We should also consider using the webhook notifications or Server-Sent Events to allow the server to push updates to clients when a task’s status changes, reducing the need for clients to repeatedly poll for updates.

Happy Learning !!