What makes EDI so hard?
EDI – Electronic Data Interchange – is an umbrella term for many different “standardized” frameworks for exchanging business-to-business transactions. It dates back to the 1960s and remains a pain point in every commercial industry from supply chain and logistics to healthcare and finance. What makes it so hard? Why is it still an unsolved problem despite many decades of immense usage?
These are the questions we get most often from developers – both developers who want to become Stedi customers and developers who want to join us to help build Stedi. And these are our favorite questions to answer, too – the world of EDI is complex, opaque, and arcane, but it’s also enormously powerful, underpinning huge swaths of the world economy.
The problem is that there just aren’t any good, developer-focused resources out there that can help make sense of EDI from the ground up. The result is that this wonderfully rich ecosystem has been locked away from the sweeping changes that are happening in the broader world of technology.
We have a 60+ person-strong engineering team at this point, and so we’ve had the good fortune of ramping up a number of people from zero to a solid working knowledge of how the whole EDI picture fits together. It helps to start at the highest level and get more and more specific as you go; this piece you’re reading now is the first one that our engineers read when they join our team – a sort of orientation to the wide, wide world of EDI – and we’ve decided to post it publicly to help others ramp up, too.
At the most basic level, there are many thousands of businesses on Earth; those businesses provide a wide variety of goods and services to end consumers. Since few businesses have the resources or the desire to be completely self-sufficient, they must exchange goods and services between one another in order to deliver finished products. This mechanism is known as trade.
When two businesses wish to conduct trade, they must exchange certain business documents, or transactions, which contain information necessary to conduct the business at hand. There are hundreds of conceivable transaction types for all manner of situations; some are common across many industries, like Purchase Orders and Invoices, and some are specific to a class of business, such as Load Tenders or Bills of Lading, which pertain only to logistics.
Businesses used to exchange paper transactions and record those transactions into a hand-written book called a ledger (which is why we refer to accounting as “the books,” and people who work with accounting systems as “bookkeepers”), but modern businesses use one or many software applications, called business systems, to facilitate operations. There are many types of business systems, ranging from generic software suites like Oracle, SAP, and NetSuite to vertical-specific products that serve some particular industry, like purpose-built systems for healthcare, agriculture, or education.
Each business system uses a different internal transaction format, which makes it impossible to directly import a transaction from one business system into another; even if both businesses were using the exact same version of SAP – say, SAP ERP 6.0 – the litany of configuration and customization options (each of which affects the system’s transaction format) renders the likelihood of direct interoperability extraordinarily improbable. These circumstances necessitate the conversion of data from one format to another prior to ingestion into a new system.
The most popular method of data conversion is human data entry. Customer A creates Purchase Order n in NetSuite (its business system) and emails a PDF of Purchase Order n to Vendor B. A clerical worker employed by Vendor B enters the data from Purchase Order n into SAP (its business system). The clerk “maps” data from one format to another as necessary – for example, if Customer A’s PDF includes a field called “PO No.”, and Vendor B’s business system requires a field called “Purchase Order #”.
People are smart – you can think of a person sort of like AI, except that it actually works – and are able to handle these sort of mappings on-the-fly with little training. But manual data entry is costly, error-prone, and impractical at high volumes, so businesses often choose to pursue some method of transaction automation, or trading partner integration, in order to programmatically ingest transactions and avoid manual data entry.
Since each business has multiple trading partners, and each of its trading partner operates on different business systems, “point to point” integration of these transactions (that is, mapping Walmart’s internal database format directly to the QuickBooks JSON API) would require the recipient to have detailed knowledge of many different transaction formats – for example, a company selling to three customers running on NetSuite, SAP, and QuickBooks, respectively, would need to understand NetSuite XML, SAP IDoc, and QuickBooks JSON. Maintaining familiarity with so many systems isn’t practical; to avoid this explosion of complexity, businesses instead use commonly-accepted intermediary formats, which are broadly known as Electronic Data Interchange – EDI.
EDI is an umbrella term for many different “standardized” frameworks for exchanging business-to-business transactions, but it is often used synonymously with two of the most popular standards – X12, used primarily in North America, and EDIFACT, which is prevalent throughout Europe. It’s important to note that EDI isn’t designed to solve all of the problems of B2B transaction exchange; rather, it is designed to eliminate only the unrealistic requirement that a trading partner be able to understand each of its trading partner’s internal syntax and vocabulary.
Instead of businesses having to work with many different syntaxes (e.g., JSON, XML, CSV) and vocabularies (e.g., PO No. and Purchase Order #), frameworks like X12 and EDIFACT provide highly structured, opinionated alternatives intended to reduce the surface area of knowledge required to successfully integrate with trading partners. All documents conforming to a given standard follow that standard’s syntax, allowing an adoptee of the standard to work with just one syntax for all trading partners who have also adopted that syntax.
Further, standards work to reduce variation in vocabulary and the placement of fields, where possible. The X12 standard, for example, has an element type 92 which enumerates Purchase Order Type Codes; the enumerated value Dropship reflects X12’s opinion that POs that might be colloquially referred to as
Drop Shipment, or
Dropship will all be referenced as Dropship, which limits the vocabulary for which an adoptee might have to account. Similarly, X12 has designated the 850 Purchase Order’s BEG03 element – that is, the value provided in the third position of the BEG segment in the 850 Purchase Order transaction set – as the proper location for specifying the Purchase Order number. This reduces some of the burden of mapping a transaction into or out of an adoptee’s business system; only one value for drop shipping and one location for PO number must be mapped.
Of course, the vast majority of fields cannot be standardized to this degree. Take, for example, the product identifier of a line item on a Purchase Order – while X12 specifies that the 850 Purchase Order’s PO107 element should be used to specify the product identifier value, the standard cannot possibly mandate which type of product identifier should be used. Some companies use SKUs (Stock Keeping Units), while others use Part Numbers, UPCs (Universal Product Codes), or GTINs (Global Trade Item Numbers); all in all, the X12 standard specifies a dictionary of 544 different possible product identifier values that can be populated in the
What we’re seeing here is that while a standard can be opinionated about the structure of a document and the naming of fields, it cannot be opinionated about the contents of a business transaction – the contents of a business transaction are dictated by the idiosyncrasies of the business itself. If a standard doesn’t provide enough flexibility to account for the particulars of a given business, businesses would choose not to opt into the standard. A standard like X12, then, does not provide an opinionated definition of transactions – it provides a structured superset of all the possible values of commerce.
Intermediary formats – that is, EDI – make life somewhat easier by limiting the number of different formats that a business must understand in order to work with a wide array of trading partners; a US-based brand selling to dozens of US-based retailers likely only needs to work with the X12 format. However, the brand still needs to account for the different ways that each retailer uses X12. Again, X12 is just a dictionary of possible values – since Walmart and Amazon run their businesses in different ways (and on different business systems), their implementation of an X12 intermediary format will differ, too.
A simple example may be that Walmart allows customers to include a gift message at the order level (“Happy Birthday – and Merry Christmas!”), whereas Amazon allows its customers to specify gift messages at the line item level (“Happy Birthday!” at line item #1, “and Merry Christmas!” at line item #2). This difference in implementation of gift message means that a brand selling to both Amazon and Walmart would need to account for these differences when ‘mapping’ the respective fields to its business system.
Such per-trading-partner nuances cannot be avoided – because different businesses operate in different ways, a single, canonical, ultra-opinionated representation of, say, a Purchase Order, is unlikely to ever exist. In other words, the per-trading-partner setup process is driven by inherent complexity – that is, complexity necessitated by the unavoidable circumstances of the problem at hand. And because field mappings such as these affect real-world transactions, they cannot be done with a probabilistic machine learning approach; for example, mapping “Shipper Address” to “Shipping Address” would result in orders being shipped to the shipper’s own warehouse, rather than the customers’ respective addresses. While there are many ways to build business-to-business integrations, any solution must account for a setup process that involves per-trading-partner, human-driven field mappings.
There are other areas of inherent complexity in EDI, too. Because businesses change over time, the configurations of the businesses’ respective business systems must change, too; an example might be a retailer adding DHL as a shipping option, whereas it previously only offered FedEx. Those changes must be communicated to trading partners so that field mappings can be updated appropriately; because such communications and updates involve ‘best efforts’ from humans, some percentage of them will be missed or completed incorrectly, leading to integration failures on subsequent transactions. Even without inter-business changes, errors happen – for example, a business system’s API keys might expire, or the system might experience intermittent downtime. Such errors will need to be reviewed, retried, and resolved. Just as every solution’s setup process will always require per-trading-partner, human-driven field mappings, every solution must also provide functionality for managing configuration changes on the control plane and intermittent errors on the data plane.
The 'laws of physics' of the business universe, then, are as follows:
- There are many businesses, and those businesses must conduct business-to-business trade in order to deliver end products and services;
- Those businesses run on a large but bounded number of business systems;
- Due to the heterogeneity of business practices, those business systems offer configuration options that result in an unbounded number of different configurations;
- The heterogeneity of configurations makes it impossible for a single unified format, therefore necessitating a per-trading-partner setup process;
- The business impact of incorrect setup requires that a human be involved in setup;
- Businesses are not static, so configuration will change over time, again necessitating human input;
- Business systems are not perfectly reliable;
- Because neither human input not business systems are perfectly reliable, intermittent errors will occur on an ongoing basis;
- Errors must be resolved in order to maintain a reliable flow of business.
Any generalized business integration system must account for all these constraints. The so-called Law of Sufficient Variety summarizes this nicely: “in order to ensure stability, a control system must be able to, at a minimum, represent all possible states of the system it controls.” Failing that, it must limit scope in some way – say, by only handling a subset of transaction types, industries, business systems, or use cases. But since limiting scope definitionally means limiting market size, any sufficiently-ambitious effort to develop a business integration system must not limit scope at all: it must provide mechanisms to work with any configuration of any business system, in any industry, across 300+ business-to-business transaction types, in any format, as well as provision for any evolutions that might develop in the future.
Such is the challenge of developing a generalized business integration system.
The good news is that such circumstances (an unbounded set of heterogeneous, complex, mission-critical, web-scale use cases) are not unique to business integrations – they mirror the circumstances found in the broader world of software development.
If, instead of setting out to create a software application for developing business integrations, one were to set out to create a software application for developing software applications, one would encounter the same challenges – to continue with the Law of Sufficient Variety, a system for developing software would need to be able to represent all possible states of the software it wishes to develop.
This, of course, isn’t reasonable – it isn’t feasible to design a single software application for developing software applications. Instead of a single application, successful platforms like Amazon Web Services provide a series of many small, relatively simple building blocks – primitives – that can be composed to represent virtually any larger application. AWS, then, can be thought of as a catalog of software applications for developing software applications. Using AWS’s array of offerings, software developers today can build web-scale applications with considerably less code and less knowledge of underlying concepts.
Whereas AWS is a catalog of developer-focused software applications for developing software applications, Stedi is a catalog of developer-focused software applications for developing business integrations.
Stedi’s platform serves three types of developers:
- Developers that need to build integrations to support their own business (e.g. an ecommerce retailer that wants to exchange EDI files with its suppliers);
- Developers that need to build EDI functionality into their own software offerings (e.g. a digital freight forwarder, transportation management system, or fulfillment provider that wants to embed self-service EDI integration into their platform);
- Developers at EDI providers or VANs that need to modernize their technology stack (e.g. a retail-specific EDI software provider that wants to replace legacy infrastructure or tools).
Using Stedi’s platform, developers can compose massively scalable and reliable integration systems of their own. Stedi’s catalog of developer-focused products includes two powerful building blocks, with many more in development:
- EDI Core (Generally Available as of October 2021), a flexible, API-driven EDI-to-JSON parser and validator, capable of validating against generic or company-specific standards.
- Mappings (Generally Available as of November 2021), a powerful JSON-to-JSON data transformation engine that allows developers to define mappings in an intuitive UI and invoke those mappings programmatically via API.
Over the coming months, we’ll be posting more of our internal guides to help others ramp up on the EDI universe. We’d love to hear from you in the meantime.
Get blog posts delivered to your inbox.