OpenTelemetry in Apollo Federation

Configure your federated graph to emit logs, traces, and metrics


OpenTelemetry is a collection of open-source tools for generating and processing telemetry data (such as logs, traces, and metrics) from different systems in a generic and consistent way.

You can configure your gateway, your individual subgraphs, or even a monolothic Apollo Server instance to emit telemetry related to processing GraphQL operations.

Additionally, the @apollo/gateway library provides built-in OpenTelemetry instrumentation to emit gateway-specific spans for operation traces.

If you are using GraphOS Router, it comes pre-built with support for OpenTelemetry .

 note
GraphOS Studio does not currently consume OpenTelemetry-formatted data. To push trace data to Studio, see Federated trace data .You should configure OpenTelemetry if you want to push trace data to an OpenTelemetry-compatible system, such as Zipkin or Jaeger .

Setup

1. Install required libraries

To use OpenTelemetry in your application, you need to install a baseline set of @opentelemetry Node.js libraries. This set differs slightly depending on whether you're setting up your federated gateway or a subgraph/monolith.

Gateway libraries
Bash
1npm install \
2  @opentelemetry/api@1.0 \
3  @opentelemetry/core@1.0 \
4  @opentelemetry/resources@1.0 \
5  @opentelemetry/sdk-trace-base@1.0 \
6  @opentelemetry/sdk-trace-node@1.0 \
7  @opentelemetry/instrumentation-http@0.27 \
8  @opentelemetry/instrumentation-express@0.28
Subgraph/monolith libraries
Bash
1npm install \
2  @opentelemetry/api@1.0 \
3  @opentelemetry/core@1.0 \
4  @opentelemetry/resources@1.0 \
5  @opentelemetry/sdk-trace-base@1.0 \
6  @opentelemetry/sdk-trace-node@1.0 \
7  @opentelemetry/instrumentation@0.27 \
8  @opentelemetry/instrumentation-http@0.27 \
9  @opentelemetry/instrumentation-express@0.28 \
10  @opentelemetry/instrumentation-graphql@0.27

Most importantly, subgraphs and monoliths must install @opentelemetry/instrumentation-graphql, and gateways must not install it.

As shown above, most @opentelemetry libraries have reached 1.0. The instrumentation packages listed above are compatible at the time of this writing.

Update @apollo/gateway

If you're using OpenTelemetry in your federated gateway, also update the @apollo/gateway library to version 0.31.1 or later to add support for gateway-specific spans .

2. Configure instrumentation

Next, update your application to configure your OpenTelemetry instrumentation as early as possible in your app's execution. This must occur before you even import @apollo/server, express, or http. Otherwise, your trace data will be incomplete.

We recommend putting this configuration in its own file, which you import at the very top of index.js. A sample file is provided below (note the lines that should either be deleted or uncommented).

JavaScript
open-telemetry.js
1// Import required symbols
2const { Resource } = require('@opentelemetry/resources');
3const { SimpleSpanProcessor, ConsoleSpanExporter } = require ("@opentelemetry/sdk-trace-base");
4const { NodeTracerProvider } = require("@opentelemetry/sdk-trace-node");
5const { registerInstrumentations } = require('@opentelemetry/instrumentation');
6const { HttpInstrumentation } = require ('@opentelemetry/instrumentation-http');
7const { ExpressInstrumentation } = require ('@opentelemetry/instrumentation-express');
8// DELETE IF SETTING UP A GATEWAY, UNCOMMENT OTHERWISE
9// const { GraphQLInstrumentation } = require ('@opentelemetry/instrumentation-graphql');
10
11// Register server-related instrumentation
12registerInstrumentations({
13  instrumentations: [
14    new HttpInstrumentation(),
15    new ExpressInstrumentation(),
16    // DELETE IF SETTING UP A GATEWAY, UNCOMMENT OTHERWISE
17    //new GraphQLInstrumentation()
18  ]
19});
20
21// Initialize provider and identify this particular service
22// (in this case, we're implementing a federated gateway)
23const provider = new NodeTracerProvider({
24  resource: Resource.default().merge(new Resource({
25    // Replace with any string to identify this service in your system
26    "service.name": "gateway",
27  })),
28});
29
30// Configure a test exporter to print all traces to the console
31const consoleExporter = new ConsoleSpanExporter();
32provider.addSpanProcessor(
33  new SimpleSpanProcessor(consoleExporter)
34);
35
36// Register the provider to begin tracing
37provider.register();

For now, this code does not push trace data to an external system. Instead, it prints that data to the console for debugging purposes.

After you make these changes to your app, start it up locally. It should begin printing trace data similar to the following:

Click to expand
JavaScript
1{
2  traceId: '0ed36c42718622cc726a661a3328aa61',
3  parentId: undefined,
4  name: 'HTTP POST',
5  id: '36c6a3ae19563ec3',
6  kind: 1,
7  timestamp: 1624650903925787,
8  duration: 26793,
9  attributes: {
10    'http.url': 'http://localhost:4000/',
11    'http.host': 'localhost:4000',
12    'net.host.name': 'localhost',
13    'http.method': 'POST',
14    'http.route': '',
15    'http.target': '/',
16    'http.user_agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36',
17    'http.request_content_length_uncompressed': 1468,
18    'http.flavor': '1.1',
19    'net.transport': 'ip_tcp',
20    'net.host.ip': '::1',
21    'net.host.port': 4000,
22    'net.peer.ip': '::1',
23    'net.peer.port': 39722,
24    'http.status_code': 200,
25    'http.status_text': 'OK'
26  },
27  status: { code: 1 },
28  events: []
29}
30
31{
32  traceId: '0ed36c42718622cc726a661a3328aa61',
33  parentId: '36c6a3ae19563ec3',
34  name: 'middleware - <anonymous>',
35  id: '3776786d86f24124',
36  kind: 0,
37  timestamp: 1624650903934147,
38  duration: 63,
39  attributes: {
40    'http.route': '/',
41    'express.name': '<anonymous>',
42    'express.type': 'middleware'
43  },
44  status: { code: 0 },
45  events: []
46}

Nice! Next, we can modify this code to begin pushing trace data to an external service, such as Zipkin or Jaeger.

3. Push trace data to a tracing system

Next, let's modify the code in the previous step to instead push traces to a locally running instance of Zipkin .

 note
To run Zipkin locally, see the quickstart . If you want to use a different tracing system, consult the documentation for that system.

First, we need to replace our ConsoleSpanExporter (which prints traces to the terminal) with a ZipkinExporter, which specifically pushes trace data to a running Zipkin instance.

Install the following additional library:

Bash
1npm install @opentelemetry/exporter-zipkin@1.0

Then, import the ZipkinExporter in your dedicated OpenTelemetry file:

JavaScript
open-telemetry.js
1const { ZipkinExporter } = require("@opentelemetry/exporter-zipkin");

Now we can replace our ConsoleSpanExporter with a ZipkinExporter. Replace lines 31-34 of the code in the previous step with the following:

JavaScript
1// Configure an exporter that pushes all traces to Zipkin
2// (This assumes Zipkin is running on localhost at the 
3// default port of 9411)
4const zipkinExporter = new ZipkinExporter({
5  // url: set_this_if_not_running_zipkin_locally
6});
7provider.addSpanProcessor(
8  new SimpleSpanProcessor(zipkinExporter)
9);

Now, open Zipkin in your browser at http://localhost:9411. You should now be able to query recent trace data in the UI!

You can show the details of any operation and see a breakdown of its processing timeline by span.

4. Update for production readiness

Our example telemetry configuration assumes that Zipkin is running locally, and that we want to process every span individually as it's emitted.

To prepare for production, we'll want to optimize performance by sending our traces to an OpenTelemetry Collector using the OTLPTraceExporter and replace our SimpleSpanProcessor with a BatchSpanProcessor. The Collector should be deployed as a local sidecar agent to buffer traces before they're sent along to their final destination. See the getting started docs for an overview.

Bash
1npm install @opentelemetry/exporter-trace-otlp-http@0.27

Then, import the OTLPTraceExporter and BatchSpanProcessor in your dedicated OpenTelemetry file:

JavaScript
1const { OTLPTraceExporter } = require("@opentelemetry/exporter-trace-otlp-http");
2const { BatchSpanProcessor } = require("@opentelemetry/sdk-trace-base");

Now we can replace our ZipkinExporter with an OTLPTraceExporter. We can also replace our SimpleSpanProcessor with a BatchSpanProcessor. Replace lines 4-9 of the code in the previous step with the following:

JavaScript
1// Configure an exporter that pushes all traces to a Collector
2// (This assumes the Collector is running on the default url
3// of http://localhost:4318/v1/traces)
4const collectorTraceExporter = new OTLPTraceExporter();
5provider.addSpanProcessor(
6  new BatchSpanProcessor(collectorTraceExporter, {
7    maxQueueSize: 1000,
8    scheduledDelayMillis: 1000,
9  }),
10);

You can learn more about using the OTLPTraceExporter in the instrumentation docs .

GraphQL-specific spans

The @opentelemetry/instrumentation-graphql library enables subgraphs and monoliths to emit the following spans as part of OpenTelemetry traces :

NameDescription
graphql.parseThe amount of time the server spent parsing an operation string.
graphql.validateThe amount of time the server spent validating an operation string.
graphql.executeThe total amount of time the server spent executing an operation.
graphql.resolveThe amount of time the server spent resolving a particular field.

Note that not every GraphQL span appears in every operation trace. This is because Apollo server can skip parsing or validating an operation string if that string is available in the operation cache.

 note
Federated gateways must not install the @opentelemetry/instrumentation-graphql library, so these spans are not included in its traces.

Gateway-specific spans

The @apollo/gateway library emits the following spans as part of OpenTelemetry traces :

NameDescription
gateway.requestThe total amount of time the gateway spent serving a request.
gateway.validateThe amount of time the gateway spent validating a GraphQL operation string.
gateway.planThe amount of time the gateway spent generating a query plan for a validated operation.
gateway.executeThe amount of time the gateway spent executing operations on subgraphs.
gateway.fetchThe amount of time the gateway spent fetching data from a particular subgraph.
gateway.postprocessingThe amount of time the gateway spent composing a complete response from individual subgraph responses.