A Cookie Study

by Karen Coyle

Cookies Today

Today, most cookies that you receive are set not by the site that you have visited but by the advertisers or advertising agencies related to the banner ads. The cookies that they set are primarily identifiers, that is unique IDs that can be used to track you as you visit sites on the Web. Although there is still the rule that a cookie can only be read by the site that set it, the same advertising agency's cookies can appear on many Web sites, so some picture of your visits to various sites is being compiled.

I've been cookie conscious for a long time and for years had my Netscape cookies.txt file emptied out and set to read-only to avoid any cookie interactions. I found that this worked pretty well in Netscape because it appears that the cookies are kept in memory until the program closes down so I could still engage in those transactions that require cookies for the duration of the session, such as making purchases.

Recently I decided to try a cookie cutter program and installed Cookie Pal (shareware, www.kburra.com). After a few months of use I had a very short list (two items) of cookies that I accept, and a very long list (over 100 items) of cookies that I always reject. New cookies are presented to me for my judgment and nearly all go into the "never accept" category. I can see statistics on how many cookies I encounter in a day of Net use and the numbers are frighteningly high getting up into the high double and low triple digits.

As the list of rejected cookies has grown I've found myself getting curious about them. In particular I wondered how easy or hard would it be to find out more about them? Ideally, before rejecting or accepting a cookie I would have some idea of who I'm dealing with. So I copied down all of my rejected cookies and started on a hunt.

The Cookie Study

These results are not "scientific" in the sense that they relate only to the cookies captured on my machine and the results cannot be extrapolated to what might be on anyone else's computer. Certainly my web surfing is not random and the methodology here is simply what worked for me at the time.

Methodology

I eliminated some of the cookies that I could easily recognize (yahoo.com, for example), but that was only a small handful. (Later I counted among the cookies 7 sites that I could recognize as ones that I actually visit.) What was left were 105 cookie "hosts". I took each host as listed in the cookie cutter program (i.e. ad.doubleclick.net) and plugged it into my browser. If I got to a web site, I looked to see if the opening screen of the site had a privacy policy posted.

Results

  1. Of 105 host names, 2/3 took me to a web page, 1/3 failed or simply did not respond.

    This means that although the cookie is supposedly being posted by a "host" many of the cookies do not lead to anywhere on the Net where you can verify who is sending it. In other words, they are anonymous. Sometimes it was possible to find a site by reformulating the host address (adding "www", for instance), but not always. A whois lookup (which I think is beyond the ken of the average Net user) often got not much more info: a cookie from xyz.com would yield a record for xyz company with a post office box and no individual names listed ("postmaster@xyz.com").

  2. Not only are many senders of cookies hiding their identity, they are of dubious technical ability. Five of the sites I visited gave me a default server page -- the one that is installed as "index.html" when you install the server but have never created a home page. One gave me a page that said simply "TEST". One linked me to the Belarus Internet Java Group (?), but the only link on that page gave an error.

    The most astonishing, though, was the site that dropped me into their directory listing where I could view all of their files, including the lists of customers and the banner ads for each of them.

    It's no comfort to think that these people might be writing to my hard drive.

  3. Of the hosts that did link to a site, less than half had a privacy notice on the site. (And the privacy notices say the usual blah blah about using aggregated data for marketing.)

  4. About the same amount (less than half but not exactly the same sites) were nothing but Internet advertising companies. The larger set were sites that sold products or services themselves. But if we assume (and it's probably a good assumption) that the hosts that did not lead to a site were also advertising companies, then about 60% of the total cookies were from advertising companies, and only 40% from actual Internet sites.

  5. I eliminated some sites that I know I visit, (less than 10% of the total). The others came from banner ads and however else cookies are set as we travel around the Net.

I'm not exactly sure what all of this means, but it confirms my impression of the nature of cookies that I encounter. The use of cookies is a kind of "stealth" marketing where the marketing company gathers information about the user but does not allow the Net user to learn anything about itself or its practices. People are not so wrong when they say that they feel that cookies are spying on them because the mechanism is very much like spying.

Suggestions

I would like my cookie cutter (and I think some of them do this) to create a separate list of those cookies that come from banner ads and the ones that are set by the site I have actively visited. To me this is an important distinction. I would also like to see a discussion of what is meant by "host" in the Cookie protocol (RFC 2109). Is it really a host if it doesn't lead to a site on the Web? And if "host" doesn't imply a web presence, should the protocol require that it does?
©Karen Coyle, 2000
Creative Commons License
This work is licensed under a Creative Commons License.