Understanding Browser Tracking & Logins: The Invisible Trail

Heather Flanagan
March 16, 2024March 16, 2024
Hot Takes
1 Comment

Understanding Browser Tracking & Logins: The Invisible Trail

This is the transcript to my first YouTube explainer video on how tracking and authentication use the same mechanisms to meet their goals, and how browsers can’t tell the difference. Likes and subscriptions are always welcome!

Introduction

Welcome to The Digital Cow Network! My name is Heather Flanagan, principal at Spherical Cow Consulting. Let’s talk about how individuals are tracked as they surf the web.

We’ve all know we are tracked as we surf the web. Sometimes, it’s to show us target advertisements to get us to spend more money. More nefariously, it influences our political leanings or generally sows societal chaos. Why isn’t there more control to prevent this from happening? Shouldn’t there be something that protects all web users, from the least tech-savvy to the most informed individuals? Yes, but it’s not that simple. The underlying mechanisms that trackers use are the same tools used to support third-party logging into a website, also known as federated authentication.

And you know what? Browsers can’t tell the difference when mechanisms used for tracking or when they are used for logging in. They are technically indistinguishable. This makes changing or removing those features more complicated than you’d think.

The Dilemma of Browser Tracking

What are these mysterious mechanisms, and why are they used when people log in to a website if they are also part of an invasion of privacy? Time for a quick history lesson!

The Internet came out of a research project. Everyone knew everyone else, or at least they knew the people online had to be a part of a very small community in order to use this new thing. The web came much later, but it still had that research and exploratory mindset, that nearly green field drive that suggested building little widgets that people could turn into anything they could imagine was the path towards innovation. The Internet, and therefore the web, was built on something that didn’t have robust, automatable trust mechanisms built in.

One of these innovations was the concept of a piece of data that would allow developers to keep information local to a browser so each web page didn’t have to ask for the same information over and over again, something called saving state. It also encouraged the use of IP addresses, once upon a time the unique address for each computer on the network, as a way to indicate whether an individual (the only one probably using that computer) to access something remotely. It encouraged a lot of things and people through of all sorts of ways to use these building blocks of the web to do new things, from supporting third-party accounts when logging in to targeting advertisements to individuals.

That’s enough history. Let’s dive into the mechanics. From third-party cookies to IP addresses and browser fingerprinting, these tools are essential for a smooth online experience but can also be used for tracking. Understanding these mechanisms is key to navigating the digital landscape.

Segment 3: Third-Party Cookies and Privacy

HTTP cookies (also called web cookies, Internet cookies, browser cookies, or simply cookies) are small blocks of information created by a website and placed on an individual’s computer or other device by their web browser. The idea behind doing this was to make it so people didn’t have to share information every time they went to a new page. Imagine if you had to log in Every. Single. Time. you went to a new web page owned by the same company. It’s crazy talk! Being able to “maintain state” from one page to the next is great for the user experience (UX).

Here’s where it gets murky. There is more than one kind of cookie, and for the sake of this explainer, we’re going to talk about first-party and third-party cookies. First-party cookies are a way for a single domain to maintain state across all the web pages in its domain. It’s definitely a desirable thing for a clean UX. These cookies are ONLY accessible by the domain that created them. No other domain can read what’s in that cookie.

Then there are third-party cookies. These cookies are accessible to any website at any domain. And sometimes that’s fine. Let’s look at CNN. The company CNN owns cnn.com, headlinenews.com, cnnradionet.com, money.com, and probably several others. The parent company wants visitors to have a consistent experience, which means leaving bits of data for their various domains to read. They want all their services to look and act like you (probably) want them to.

Third-party cookies are also used so that when you tell a website you want to log out, it knows what to log you out of. That’s a big security feature right there and something that needs to be adapted when browsers no longer support third-party cookies. Though, to be honest, implementing this properly is really hard and many sites have already given up on the idea of a secure, global logout.

Segment 4: The Complexities of Browser Fingerprints and IP Addresses

IP Addresses

Most people don’t have to think about a computer’s numeric address. That said, computers communicate via numbers, and both your devices (your phone, tablet, and desktop) and the computers hosting the websites you’re visiting all have these addresses. They are called IP addresses and are a fundamental necessity to ensuring Internet traffic knows where to go.

Once upon a time, these addresses were tightly tied to specific computers that didn’t move around. There were also enough of these addresses for all the computers on the Internet. And with that in mind, it made sense to say, “I know who is at this IP address. That address can access my computer.” While the basic assumptions about IP addresses are no longer always accurate, there’s still something to them. Some websites still create access policies based on what IP addresses are allowed. It’s a very smooth UX; the individual doesn’t have to do a thing. It just works.

When you go to a website, one of the things that happens is that the visit is recorded in a log. Logs are necessary to troubleshoot problems. They are also a way to understand who and what is visiting a website. And if that data is shared behind the scenes, a single IP address can be followed like you can follow footprints in the snow.

Browser Fingerprints

When you install a web browser, you probably add plug-ins, add-ons, your favorite fonts, your preferred settings, and so on. Here’s a fun fact: if you have done those things, you might have an impressively unique “fingerprint” that distinguishes your browser from all others.

There are a few websites that will let you check this out. For example, https://amiunique.org/ and https://coveryourtracks.eff.org/ will let you see just how unique your browser setup is when compared to others.

When you visit a website, it often asks your browser about all its settings to offer you the best UX. It might have a slightly different way of presenting information if you’re using Chrome or Firefox. It might pop up a sadness window if you’ve blocked ads.

Browser fingerprinting isn’t usually used to let you log into a website. However, some enterprise software may be looking for a specific plug-in or configuration as part of its requirements to allow you access.

Link Decoration and Bounces

Link Decoration

But wait, there’s more! One of the more subtle and ubiquitous features of trackers and authentication services is link decoration. Also referred to as “navigation-based tracking,” this feature adds extra information to the URL. That information can be part of an authentication token. It can be part of a query so the website knows what to display after you’ve done a product search. When you click on the link in an email, it will add the method you used to the address so the website owners know what advertising works for them.

Browser vendors are cautious when changing what they’ll support in links. Of course, they also have to worry about being able to identify when links are being used for tracking or something else. So many features that the web experience is built on requires link decoration. And so many trackers love it. It’s complicated.

Bounces or Redirects

The last one we’ll review in this post is bounces, also known as redirects. Bounces and redirects are as fundamental to the web experience as link decoration. These are used extensively to support critical login features for protocols like OAuth and SAML, used by websites everywhere. Authentication is complicated, and there is almost always more than one computer behind the scenes making it happen. These computers, possibly from different domains, need to talk to each other to get you logged in. They bounce data back and forth before returning a “go” or “no go” to your login experience.

Bounces are also, unfortunately, used by trackers to get around the third-party limitations on cookies described in the section above.

Remember that browsers are trying to prevent websites from reading cookies they didn’t write. Cookies are for the first party; third parties need to find another website to bother. But what if there is an advertising agreement between these two parties? They really want to share that data! The easiest thing to do is to make that third party become a first party.

When an individual visits a website directly, that makes it a first-party experience. You visit Website A; Website A is now a first party. But what if Website A has an agreement with an advertiser at Website B and adds a single pixel to their web page? Then the individual loads that pixel, thus “visiting” Website B. Website B can now set or edit a first-party cookie. They can do that every time you visit a website they have an agreement with, thus building up a record of what sites you’ve visited, where you’ve made purchases, etc.

Wrap Up

Browser vendors can’t just wave a magic keyboard and fix all tracking challenges. Turning off all the mechanisms that allow tracking will also break some of the most critical security features of the web. Work within the World Wide Web Consortium (W3C) is underway to find a way through this mess. Browser vendors have also been making their changes. Apple turned off third-party cookies years ago. Mozilla turned them off by default for all Firefox users in 2023. Google Chrome is still working on it, but they are much bigger than all other browser vendors combined and have much more business riding on these features.

In our next post, I’ll focus on the work around the deprecation of third-party cookies.

I love to receive comments and suggestions on how to improve my posts! Feel free to comment here, on social media, or whatever platform you’re using to read my posts! And if you have questions, go check out Heatherbot and chat with AI-me.

Understanding Browser Tracking & Logins: The Invisible Trail

Understanding Browser Tracking & Logins: The Invisible Trail

Introduction

The Dilemma of Browser Tracking

Segment 3: Third-Party Cookies and Privacy