VS Code - What’s the deal with the telemetry?

The word telemetry is derived from the Greek - tele meaning “remote”, and metron meaning “measure”. And that what it is really, remote measuring.

When I think of telemetry, I think of Nasa’s mission control in Houston monitoring the Apollo 11 rocket in space. Through the gathering of data from the rocket’s systems and the external conditions, and beaming it to earth, mission control assisted the astronauts in vital decisions that got them to the moon and back safely, a first for mankind. That’s rather cool, but what is my code editor up to?

toy rocket drawn in 3d cartoon style
This is not Apollo 11

I ain’t on no space mission!

Why is there telemetry in VS Code?

In software, telemetry is used to gather data on the use and performance of applications. Microsoft states in their docs on telemetry that they do the following:

Visual Studio Code collects telemetry data, which is used to help understand how to improve the product. For example, this usage data helps to debug issues, such as slow start-up times, and to prioritize new features. While we appreciate the insights this data provides, we also know that not everyone wants to send usage data and you can disable telemetry as described in disable telemetry reporting.

The three different types of data that Microsoft say they gather with regard to telemetry is:

  1. Usage Data: Information about how features are used and perform in VS Code which help prioritize future product improvements.
  2. Crash Reports: Diagnostic information when VS Code crashes.
  3. Error Data: Information about errors that do not crash the application but are unexpected.
eggs in a carton with googly, glued-on eyes
Egg company wants to monitor how you eat your eggs in the name of product improvement

An obvious concern is that data collected could be used to profile users. Many businesses consider profiling users as fair game these days, and are not forthcoming with what data is collected exactly, and how the data is used for.

Now there is stronger data protection legislation to combat inappropriate data collection. There is the General Data Protection Regulation (GDPR) in Europe, and other countries have adopted similar forms of this e.g. the California Consumer Privacy Act (CCPA) that was adopted on 28 June 2018. However, there are still various inconsistencies in the practical and technical implementation of GDPR, and enforcement is challenging, to say the least. Big companies can afford to take more chances as they can pay the fines if they come.

What data is collected exactly?

conceptual photo of wired connections

You can generate a JSON report of the possible telemetry events that VS Code can record. You can do this on the command-line using the --telemetry flag. The report is generated per build and does not contain extension telemetry unless the extension author adds a telemetry.json file to their root build directory.

I ran code --telemetry > telemetry.json to generate a report. Here is a GitHub Gist of the report.json. It contains nearly 1600 possible events (if I am interpeting it correctly). That is a lot of data points.

This report gives you a rough idea of what type of data is collected, I didn’t take any major insights from it. It is probably better to log the events, to get a proper grasp on how that data looks. Let’s give that a go.

Logging the telemetry data

conceptual photo of long log

You can log the telemetry events in VS Code as they are sent by doing the following:

  1. Run the “Developer: Set Log Level…” command and select the option Trace.
  2. Then in the Output pane, pick Log (Telemetry) from the dropdown. It is the very, very last item. It easy to miss because of the size of the dropdown!
  3. These events are recorded to an actual log file also. You can view using the “Developer: Open Log File…” command and choosing Telemetry from the dropdown.

As an example, I turned on all telemetry while writing some of this blog post. This is the log file with some data anonymousized.

Here is an excerpt:

Log
[2022-04-11 17:41:02.344] [2d0717af-bf00-4938-8393-a69f692747a9] [trace] telemetry/settingsEditor.settingModified {"properties":{"key":"telemetry.telemetryLevel","groupId":"local","target":"user","common.machineId":"XYZ","sessionID":"07c32ecf-a814-4c69-a156-dbcb9930bc891649683267895","commitHash":"8dfae7a5cd50421d30ddo9cb873990460525a898","version":"1.66.1","common.platformVersion":"5.13.0","common.platform":"Linux","common.nodePlatform":"linux","common.nodeArch":"x64","common.product":"desktop","timestamp":"2022-04-11T16:41:02.218Z","common.snap":"true","common.version.shell":"17.2.0","common.version.renderer":"98.0.4758.109","common.firstSessionDate":"Sun, 08 Nov 2020 16:09:49 GMT","common.lastSessionDate":"Mon, 11 Apr 2022 11:28:34 GMT","common.isNewSession":"0","common.remoteAuthority":"none"},"measurements":{"displayIndex":2,"showConfiguredOnly":0,"isReset":1,"common.timesincesessionstart":11994323,"common.sequence":1}}
[2022-04-11 17:41:04.307] [2d0717af-bf00-4938-8393-a69f692747a9] [trace] telemetry/fileGet {"properties":{"mimeType":"application/unknown","ext":".git","common.machineId":"XYZ","sessionID":"07c32ecf-a814-4c69-a156-dbcb9930bc891649683267895","commitHash":"8dfae7a5cd50421d30ddo9cb873990460525a898","version":"1.66.1","common.platformVersion":"5.13.0","common.platform":"Linux","common.nodePlatform":"linux","common.nodeArch":"x64","common.product":"desktop","timestamp":"2022-04-11T16:41:04.302Z","common.snap":"true","common.version.shell":"17.2.0","common.version.renderer":"98.0.4758.109","common.firstSessionDate":"Sun, 08 Nov 2020 16:09:49 GMT","common.lastSessionDate":"Mon, 11 Apr 2022 11:28:34 GMT","common.isNewSession":"0","common.remoteAuthority":"none"},"measurements":{"path":1926235831,"reason":2,"common.timesincesessionstart":11996407,"common.sequence":2}}
[2022-04-11 17:41:27.122] [2d0717af-bf00-4938-8393-a69f692747a9] [trace] telemetry/fileGet {"properties":{"mimeType":"application/unknown","ext":".git","common.machineId":"XYZ","sessionID":"07c32ecf-a814-4c69-a156-dbcb9930bc891649683267895","commitHash":"8dfae7a5cd50421d30ddo9cb873990460525a898","version":"1.66.1","common.platformVersion":"5.13.0","common.platform":"Linux","common.nodePlatform":"linux","common.nodeArch":"x64","common.product":"desktop","timestamp":"2022-04-11T16:41:27.121Z","common.snap":"true","common.version.shell":"17.2.0","common.version.renderer":"98.0.4758.109","common.firstSessionDate":"Sun, 08 Nov 2020 16:09:49 GMT","common.lastSessionDate":"Mon, 11 Apr 2022 11:28:34 GMT","common.isNewSession":"0","common.remoteAuthority":"none"},"measurements":{"path":1926235831,"reason":2,"common.timesincesessionstart":12019226,"common.sequence":3}}
[2022-04-11 17:42:41.752] [2d0717af-bf00-4938-8393-a69f692747a9] [trace] telemetry/workbenchActionExecuted {"properties":{"id":"editor.action.clipboardCutAction","from":"keybinding","common.machineId":"XYZ","sessionID":"07c32ecf-a814-4c69-a156-dbcb9930bc891649683267895","commitHash":"8dfae7a5cd50421d30ddo9cb873990460525a898","version":"1.66.1","common.platformVersion":"5.13.0","common.platform":"Linux","common.nodePlatform":"linux","common.nodeArch":"x64","common.product":"desktop","timestamp":"2022-04-11T16:42:41.741Z","common.snap":"true","common.version.shell":"17.2.0","common.version.renderer":"98.0.4758.109","common.firstSessionDate":"Sun, 08 Nov 2020 16:09:49 GMT","common.lastSessionDate":"Mon, 11 Apr 2022 11:28:34 GMT","common.isNewSession":"0","common.remoteAuthority":"none"},"measurements":{"common.timesincesessionstart":12093846,"common.sequence":4}}

It records a lot of the actions I took:

For the events, it keeps track of a common.machineId, a sessionID, session times, and various fields about the VS Code installation.

The volume of events tracked seems excessive to me. If you multiply these events by thousands of users and thousands of hours, you are generating, sending, and storing a heck of a lot of data. It’s not good for the environment.

Next, I wanted to see what would happen when I disabled telemetry. After some activity, I checked the log, and found no events were logged. That is good.

The absence of logged events doesn’t mean that data is not being sent though! A more accurate way to see what is being sent out is to monitor your outgoing network traffic and analyze the packets. You could do this with a tool like Wireshark. It would be interesting to see what actual data is being sent out when telemetry is turned off! I will let that investigation up to you as I don’t have much experience in that area.

Disabling telemetry

To disable telemetry, you need to edit the setting telemetry.telemetryLevel, and the runtime argument enable-crash-reporter. These settings are enabled by default.

If you don’t want to send any telemetry data to Microsoft, then you can add this to your settings.json:

JSON
"telemetry.telemetryLevel": "off"

To stop the sending of crash data to Microsoft, you need to set the enable-crash-reporter runtime argument by:

All done?

toy rocket drawn in 3d cartoon style

Nope.

It looks like you cannot shut telemetry off 100%. These settings will opt you of most data sharing scenarios; but not all data sharing scenarios. See excerpt below from section 2a of the product license:

The software may collect information about you and your use of the software, and send that to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may opt-out of many of these scenarios, but not all, as described in the product documentation located at https://code.visualstudio.com/docs/supporting/faq#_how-to-disable-telemetry-reporting.

They can collect data for some scenarios regardless of your settings. According to VSCodium, who build an alternative product from the same codebase, that is what Microsoft does anyway:

Even though we do not pass the telemetry build flags (and go out of our way to cripple the baked-in telemetry), Microsoft will still track usage by default.

Also, extensions do their own thing for telemetry. Microsoft says:

These extensions may be collecting their own usage data and are not controlled by the telemetry.telemetryLevel setting. Consult the specific extension’s documentation to learn about its telemetry reporting and whether it can be disabled.

Since a tonne of features belong to extensions outside of the core product, that is a big surface area that is not under the control of a single setting.

Some of Microsoft’s extensions collect data. VSCodium mentions that Microsoft’s C# extension (ms-vscode.csharp) sends data to Microsoft. There does not appear to be any setting offered by the extension to turn telemetry off. I checked the installed extension folder and it does not appear to have a telemetry.json in the root directory to report its telemetry events.

In any case, you cannot opt-out of all data sharing scenarios for the C# extension according to section 3a of its license:

3. DATA.
a. Data Collection

The software may collect information about you and your use of the software, and send that to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may opt-out of many of these scenarios, but not all, as described in the product documentation. There are also some features in the software that may enable you and Microsoft to collect data from users of your applications.

Sound familiar?

There seems to be a similar clause in all of Microsoft’s licenses that empowers them to collect data if they want to, regardless of your wishes.

Why have licenses like these if they are just concerned with product improvement?

toy rocket drawn in 3d cartoon style
Oh! You didn't want me to collect *that* data

Also, I recommend that you review the “online services” settings. These settings dictate what data is sent out to servers. Beyond crash reporting and telemetry, VS Code uses online services for other purposes such as downloading product updates, managing extensions, and providing natural language searching within the Settings editor.

Run the command “Preferences: Open Settings (UI)” to open the Settings UI, and type @tag:usesOnlineServices. This will display all settings that control the usage of online services and you can individually switch them on or off.

online services settings

There are also server calls made to non-Microsoft servers for:

Is your data protected under General Data Protection Regulation (GDPR)?

Microsoft state that they follow the General Data Protection Regulation (GDPR), and that these practices apply to all geographies, not just Europe.

However, you cannot access the data collected. The right of access (Article 15) article of the GDPR gives people the right to access their personal data, and know how their personal data is being processed. Microsoft state that they cannot provide data because it is (kind of) anonymous:

One question we expect people to ask is to see the data we collect. However, we don’t have a reliable way to do this as VS Code does not have a ‘sign-in’ experience that would uniquely identify a user. We do send information that helps us approximate a single user for diagnostic purposes (this is based on a hash of the network adapter NIC on the desktop and a randomly assigned UUID on the web) but this is not guaranteed to be unique. For example, virtual machines (VMs) often rotate NIC IDs or allocate from a pool. This technique is sufficient to help us when working through problems, but it is not reliable enough for us to ‘provide your data’.

That seems odd to me. Why approxmiately identify people in the first place?

And there is a ‘sign-in’ experience in VS Code. You can sync your settings by signing into a Microsoft account or GitHub account. You may not use it, but you can be obviously identified that way. Is that data linked?

sign in for settings sync

If we look at the telemetry report that I generated earlier, and search for events related to syncing your user account, we find one as below.

JSON
 "sync.userAccount": {
"id": {
"classification": "EndUserPseudonymizedInformation",
"purpose": "BusinessInsight",
"endPoint": "none"
},
"providerid": {
"classification": "EndUserPseudonymizedInformation",
"purpose": "BusinessInsight",
"endPoint": "none"
}
},

It is interesting that it classifies this event as “EndUserPseudonymizedInformation” with a purpose of “BusinessInsight”. Is a pseudonym an attempt to keep users anonymous?

I am speculating here somewhat. I am not sure that the “sync.userAccount” event is related to settings sync.

In their Visual Studio Family Data Subject Requests for the GDPR and CCPA article, they speak more directly on this matter:

Only personal data that is attached to authenticated identities can be serviced by a DSR [Data Subject Request]. So, for example, because Visual Studio Code does not support sign-in, system-generated logs from it are not attached to an authenticated identity and cannot be serviced. However, some Microsoft extensions for Visual Studio Code may provide authenticated data, and this data can be serviced by a DSR.

Is this quasi-anonymous behaviour to relieve themselves of the right of access (Article 15) article of the GDPR?

Under the right of access article, you can make a formal request called a data subject request (DSR) to a data controller (whoever holds your data) to take an action on that data. A data controller must provide an overview of the categories of data that are being processed (Article 15(1)(b)) as well as a copy of the actual data (Article 15(3)); furthermore, the data controller has to inform the data subject on details about the processing, such as the purposes of the processing (Article 15(1)(a)), with whom the data is shared (Article 15(1)©), and how it acquired the data (Article 15(1)(g)).

Microsoft have added GDPR annotations to telemetry events to describe their purpose. I guess that covers Article 15(1)(a) mentioned above. It is an enumerated variable, which can contain values: “FeatureInsight”, “PerformanceAndHealth”, “BusinessInsight”, and “SecurityAndAuditing”. It is kind of vague. What does a “BusinessInsight” event mean?

My own take is that I do not feel like I am adequately protected. I am not totally anonymous. I cannot access data collected. Microsoft extensions collect data implicitly, and may not offer settings to control what is sent out. The fact that all Microsoft product licenses say that users cannot opt out of all data collection makes me doubt the sincerity of the effort.

codium logo

Are there more privacy-focused alternatives to VS Code?

VS Codium is a fully open-source alternative to VS Code based on the same codebase. They try to remove all telemetry, read more about that here.

However, VSCodium can’t shut out all the data collection as it is the same codebase. And since extensions act independently with regard to data collection, you still need to be mindful of what extensions you install.

You would need to search around for other options if you want to find a more privacy-oriented, open-source code editor.

Final thoughts

We live in an age where data collection is pervasive. It is important to be understand what data is collected by the products you use, and you should exhibit some healthy skepticism on how that data is used.

On the surface, telemetry is benign and has a reason to exist. However, if it is not done in a wholly transparent way, and you do not have total control over what data gets sent out, then I would suspect ulterior motives. When you couple this with product licenses that grant a company permission to collect your data anyway, then the control you are given is illusory.

Is this where you say, “if the product is free, you are the product”?

The boundaries are not drawn on whether you pay for something or not. Paid products collect data too. The boundaries are drawn in the terms and conditions of a product. Then, there is the actual behaviour of the product. And then, there is our behaviour towards it. Do you read the terms and conditions? Do you pay attention to what happens in the background?

Ultimately, society and law are the arbiters for what is acceptable. They are never in total agreement. Nowadays, we accept that some form that data collection is happening in a lot of places. The law is playing catch up.

In my opinion, you should make conscious decisions about the products you use. You should understand that you are choosing a product that wants to collect your data, even if it is more or less following the law. And even if it is offering you control, it is not meaningful control.

I find this kind of behaviour Orwellian. This is a kind of double speak that degrades us all.

Tagged