Imagining a universal workflow automation

12 min readFeb 13, 2024

I have been coding in JS and building on the web for a fair while but in the last years I explored the area of business process automation and the whole are of ‘no code’ / ‘low code’. There’s things that I like about it. It’s fun to spend time drag and dropping and visually playing with elements instead of doing everything with code. It’s refreshing to integrate services together in a few moments instead of studying documentation on their OAuth workflow. At the same time, I can see how much more power coding still has over any no-code tool. You can share code, reuse code, compose code, generate code with AI, write comments anywhere you want.
As your no-code project grows in complexity you quite soon hit a limit that’s harder to push, while with code the ceiling is much higher before you hit the diminishing returns in productivity.

One way the low code world could push the ceiling further is probably via some kind of convergence. After all, automation is about seamless interconnection and the closer everything is together, the easier that is.

The web stands on the trifecta of HTML / CSS / JavaScript. It’s such a fundamental part. Tens of thousands of products are built on this shared foundation. The web is built on shared protocols, specifications and open source collaboration and that’s what makes it so powerful.

As a web developer who is exploring the world of business process automation It hurts to see so many different formats, different approaches that do the same thing — execute a series of tasks.

There’s so many platforms that deal with tasks and pipelines. Zapier, Pipedream, Make.com, Windmill… And as far as I know all of them have their own format of describing the workflow process in code, even though the philosophy is essentially the same:

There’s a trigger in the beginning — often incoming data coming to a webhook, or time based interval, or detection of change of data of some kind.
First task is being run. Each task has clearly defined inputs and outputs. The first task is probably using some output from the trigger.
Subsequent tasks run afterwards, using outputs of previous tasks as input.
Each task can define for which previous tasks it needs to wait before execution. This allows to execute different tasks in parallel.
UI shows you the progress of the running pipeline, it shows you where an error happens and it allows you to re-run from a specific task.

Universal workflow language

Since it’s a repeated issue, the whole automation landscape would I think greatly benefit if different services agreed upon standardised format / protocol. There’s many benefits for having this…

You could export a Zap and load it in Pipedream or Make.com and vice versa (Although there’s often no incentive to allow your clients to easily migrate to a competitor)
You could have the workflow running locally on some open source workflow engine
You could pick from multiple diagram programs and feed it your workflow to visualise it
Your workflows could be version controlled

Moreover, I think that once we would have this highly polished automation format and software that can interpret it and read it, it could make workflow automation overall more flexible and reusable.

Workflows everywhere

Some of the most advanced automation services out there are specialised to one domain, for example CI / CD — Continuous integration and Continuous deployment. In a way continuous deployment follows the same philosophy as generic workflow automation. The trigger for the deployment is often a commit in a code repository. Then a series of tasks run, often in pipelines in parallel until the last action uploads files to the right place in the cloud. Github actions are one of the most amazing workflow automation programs out there, so it seems kind of a pity to me that the engine is limited only to the deployment of programs.

I believe (or I want to believe) that in the future CI/CD will just be one of multiple folders in your automation suite.

Most software could be seen as a series of workflows. Some have a lot of background logic so that does not fit this theme — for example games that are simulating a world or different kinds of autonomous AIs or an antivirus continuously scanning your computer. But a lot of programs just wait for an event to arise and they react with a bunch actions and then sleep and wait for the next event.

CI / CD

Trigger: New commit in a repository (or manual action)
    -> build app
    -> lint, run tests
    -> deploy to the cloud

Business process automation

Trigger: New email with specific subject
  -> add a task to Asana
  -> send a Slack message

Marketing automation

Trigger: user filled a form
  -> Track activity in CRM
  -> Schedule a new email with template X

API server

Trigger: request received on URL '...'
  -> Validate the request
  -> Query the database
  -> Send a response to the client

Trigger: user performed an action (mouse event, keyboard event...)
  -> Analyze the performed action
  -> Compute new state
  -> Rerender relevant part of the screen

All of these domains have their own specialised platforms and all of them separately invent and maintain some kind of workflow engine with its own UI.

What I think would be more powerful is if we embraced a generic workflow tool and then let CI / CD, Marketing automation, Business process automation and so on be handled by plug ins and extensions.The particular challenge with CI/CD is that it often involves long-running tasks, such as building and testing the app, necessitating the creation of new, isolated environments.. The CI / CD could specialize in just that and only communicate with the main workflow runner. It would be headless. In last yars we’ve seen a rise in headless services (meaning they don’t have their own GUI). Oftentimes they offer a premium service, exactly because they have a more narrow scope and don’t have to reinvent the wheel.

One marketplace / registry ?

Standardization also allows to publish apps / packages that can be run from anywhere. At the moment if you want to add your service to the workflow runners out there, you have todo it separately for Zapier, Pipedream, Make.com or perhaps Github marketplace.

In the area of code, there’s package managers that have a standardised format. For web development the main package manager is NPM. Node packages have specific conventions, they might declare dependencies, they have a pre-install and post-install script.

Even if there’s multiple different workflow services, they could be connecting to the same registry. If there’s a sufficient protocol, you could create a package for your product and have it appeared everywhere.

Overview of existing workflow languages / formats

I think important part of converge is coming up with a universal, well readable format.

Zapier allows us to export its Zap workflows to JSON format, if you’re a premium user.

Here’s an example:

{
  "zap": {
    "name": "Email to Todoist and Slack Notification",
    "enabled": true,
    "trigger": {
      "app": "gmail",
      "name": "New Email",
      "fields": {
        "label": "Inbox"
      }
    },
    "actions": [
      {
        "app": "todoist",
        "name": "Create Task",
        "fields": {
          "content": "{{subject}}",
          "due_string": "today",
          "priority": 4
        }
      },
      {
        "app": "slack",
        "name": "Post Message",
        "fields": {
          "channel": "#general",
          "text": "New task created in Todoist from email: {{subject}}"
        }
      }
    ],
    "accounts": {
      "gmail": {
        "account_id": "gmail_account_1234"
      },
      "todoist": {
        "account_id": "todoist_account_5678"
      },
      "slack": {
        "account_id": "slack_account_9012"
      }
    }
  }
}

This workflow is triggered on new email and it sends a task to TodoIst and notification to Slack. It’s fairly readable but Zapier does not have support for more advanced control flows such as branching like Make.com. On the other hand Make.com does not allow to export the workflows in any way.

I asked ChatGPT to reimagine this file as if it were being processed by Github Actions in a workflow scenario. Here’s the outcome:

name: Email to Todoist and Slack Notification

on:
  schedule:
    - cron: '*/15 * * * *' # Runs every 15 minutes, adjust as needed

jobs:
  check_email_and_notify:
    runs-on: ubuntu-latest
    steps:
      - name: Check for new email in Gmail
        id: check_email
        uses: actions/get-gmail@v1
        with:
          gmail_token: ${{ secrets.GMAIL_TOKEN }}
          search_query: 'label:inbox is:unread'

      - name: Create Todoist Task
        if: steps.check_email.outputs.new_email == 'true'
        uses: actions/create-todoist-task@v1
        with:
          todoist_token: ${{ secrets.TODOIST_TOKEN }}
          task_content: ${{ steps.check_email.outputs.email_subject }}
          due_string: 'today'

      - name: Notify on Slack
        if: steps.check_email.outputs.new_email == 'true'
        uses: actions/slack-notify@v1
        with:
          slack_token: ${{ secrets.SLACK_TOKEN }}
          channel: '#general'
          message: 'New task created in Todoist from email: ${{ steps.check_email.outputs.email_subject }}'

The readability seems improved to me, and I doubt it’s just because of the switch from JSON to YAML. It feels more complete with the with clause that clearly defines the inputs of each job. Version controlling of the apps that are being used in the workflow also makes the whole process more stable.

Github actions have a lot of documented workflow mechanisms, for example needs which allows to run multiple tasks in parallel and then converge when needed.

The only thing that’s missing here is some kind of repeat / loop mechanism. And it makes sense since in CI / CD flow this is rarely needed.

There’s some similarities to open source workflow task runners, for example pypyr.io. Here’s an example of a pypyr workflow:

# ./show-me-what-you-got.yaml
context_parser: pypyr.parser.keyvaluepairs
steps:
  - name: pypyr.steps.echo
    in:
      echoMe: o hai!
  - name: pypyr.steps.cmd
    in:
      cmd: echo any cmd you like
  - name: pypyr.steps.shell
    in:
      cmd: echo ninja shell power | grep '^ninja.*r$' 
  - name: pypyr.steps.py
    in:
      py: print('any python you like')
  - name: pypyr.steps.cmd
    while:
      max: 3
    in:
      cmd: echo gimme a {whileCounter}
  - name: pypyr.steps.cmd
    foreach: [once, twice, thrice]
    in:
      cmd: echo say {i}
  - name: pypyr.steps.default
    in:
      defaults:
        sayBye: False
  - name: pypyr.steps.echo
    run: '{sayBye}'
    in:
      echoMe: k bye!

Pypyr is pretty powerful. It allows for branching, looping and the result seems readable. It does not work with the concept of triggers though.
The main usecase for pypyr seems to be running it yourself on your server or your local machine and coding the triggering mechanism yourself with code. Big part of it also seems to be the cmd functionality which directly executes shell commands on the machine it runs on. In cloud this could be tricky. CI/CD services are fine with this, they spin up a new isolated container for each pipeline run so you’re free to execute shell commands in that isolated environment. But Zapier or Pipedream, as far as I know, don’t do this and it kind of make sense — booting an entire new container environment is extra work and it delays the start of the workflow. For CI/CD this makes sense — you usually don’t deploy that many times per day. But for generic workflow automation service this probably wouldn’t be a good default. If you need to fire certain workflow hundreds of time per day, you can’t fire a specific container for each. If anything, you need a more restricted code block with a specific API that can be started in matter of milliseconds.

There’s another open source service which is very flexible and offers a lot of functionality — Windmill. Windmill Flows allow branching, looping and everything you need. Exported flows come in JSON format but I can’t share a complete example here. While the core of the file is human readable, there’s a lot of technical details included in the file intended solely for the task runner. Windmill is very promising and it’s open source but the format seems too “internal” and not something that could be reused elsewhere or adjusted by human / AI.

AWS and Azure

Two big players have defined their own state languages.

This is an example for Azure Logic App:

{
  "triggers": {
    "When_a_HTTP_request_is_received": {
      "type": "Request",
      "methods": ["POST"]
    }
  },
  "actions": {
    "Condition": {
      "type": "If",
      "expression": "@contains(triggerBody()['message'], 'Hello')",
      "actions": {
        "Yes": [
          {
            "type": "For_each",
            "value": "@triggerBody()['items']",
            "actions": {
              "Send_an_email": {
                "type": "SendEmail",
                "parameters": {
                  "To": "example@example.com",
                  "Subject": "Loop Item",
                  "Body": "@items('For_each')"
                }
              }
            }
          }
        ],
        "No": []
      }
    }
  }
}

These are state languages, so perhaps in abstraction one level lower than a higher level “workflow language”. They effectively describe a state machine, rather than a workflow. They don’t have to have a specific trigger defined.

In some ways these seem like a proper way to go. I personally lack some kind of identifier for each task and ability to refer to each of them in subsequent steps.

And perhaps I am biased but I like the higher level formats such that of Github actions more.

Workflows as code

Defining workflows as code is ultimately the most powerful way to do it. But the problem is that once again, this is too powerful, perhaps even more then giving the ability to execute bash commands.

Cloud platforms likely wouldn’t accept a script that could execute code with side effects to define the workflow.

But ultimately this approach is tempting to me, although it doesn’t likely have the potential to become the “universal way”.

As a frontend developer I think back to the era when we used to alternative tools to build and compile Web applications: Grunt and Gulp.

Grunt was a build tool that was defined by a JSON config file. So it is declarative and configuration driven and in many ways close to the examples outlined above. You define a structure and the underlying engine interprets and hopefully does what you want.

Gulp on the other hand is imperative. You don’t just instruct what to do, but rather how to do it. It is imperative by being code-driven. Back in those days I didn’t really like Grunt and I learned to really like Gulp. It gave me more power and through experimentation I could usually debug and then achieve what I wanted.

Consider this example:

const gulp = require('gulp');
const { checkForNewEmail, createTodoistTask, notifyOnSlack } = require('api');

function createWorkflow() {
  let emailDetails = {}; // Encapsulated state

  async function checkEmail() {
    // Implementation...
    emailDetails = await checkForNewEmail(...);
  }

  async function todoistTask() {
    if (!emailDetails.newEmail) return;
    // Implementation...
    await createTodoistTask(...);
  }

  async function slackNotification() {
    if (!emailDetails.newEmail) return;
    // Implementation...
    await notifyOnSlack(...);
  }

  return gulp.series(checkEmail, todoistTask, slackNotification);
}

gulp.task('emailToTodoistAndSlack', createWorkflow());

As a JavaScript developer I like this a lot. I would love to be able to imperatively interact with the API instead of settings up a configuration file. And I think this approach could even work with some No code / low code automation cloud services. Workflow UI could be generating a script and you could also provide a script. It opens door to all kinds of meta practices and if you run this code locally it’s really easy to debug. But yes, this is challenging. Say you would provide a script at first and then the UI would interpret it, override and it certain details could be lost. The service would have to be able to patch the script in a really smart way so that it does not lose any previous functionality. I think there’s something about this, but it’s definitely not universal and so not the way to unify the workflow automation world together.

Conclusions

As tech overall evolves, unification happens. Web was unified and developed standards. HTML, CSS and JS was standardized. Need for types appeared and several solutions appeared in the wild but overtime the community converged on using TypeScript. The JSON format was developed and is widely used all over now. As Web 2.0 brought rise to more dynamic web apps that communicated with servers via API, specification for JSON APIs appeared: OpenAPI, JSON:API, REST API and so on.

I’m a big fan of Airtable. A lot of custom SAAS solutions are slowly being replaced with it, and it’s a good thing. Instead of using 10 different services, you can use one really flexible one and mod it with a few plugins / automations. Universality and flexibility is better than rigidness. It’s better if software bends to your needs, rather than you bending based on the structure of the software.

I’m hoping this trend will continue in the workflow automation landscape. CI / CD, business process automation or perhaps even DevOps could be unified under one generic automation platform. Whether you’re just connecting your CRM to your email, or deploying a webapp from your Github repository or perhaps even booting up infrastructure (something you would otherwise use Terraform for), you would all do it in one shared platform, operating on one universal workflow language. Instead of Marketing, DevOps and Business Process Automation specialists, we would have just Automation engineers, because the underlying patterns are the same. Perhaps Home Automation and IoT in general could fit into this scheme as well. And AI would be overall much better able to assist — unification helps LLMs greatly as they have one larger dataset to train with, instead of several smaller ones.

And if we stretch this further, this generic workflow automation software could be used as BE API as well. Think of drag and drop UI as your backend. After all API endpoint is again often just a workflow. On request received, you run a bunch of steps and in the end you send a response. There’s already services for this like Xano, but you could use Windmill for this usecase as well.

At the moment I am a big fan of Github actions. I like the philosophy of Github marketplace and I like how the YAML configuration file is structured. I would love to see something similar to it but more universal. Windmill is probably closest to that. It’s open source, it has great workflow capabilities and more. It’s just that the UX is so far a bit too technical and targeted at developers. But I’m super curious where it will evolve.

Probably it’s not a matter of if, but rather when.