Skip navigation
Help

Cache

warning: Creating default object from empty value in /var/www/vhosts/sayforward.com/subdomains/recorder/httpdocs/modules/taxonomy/taxonomy.pages.inc on line 33.

An anonymous reader writes "There is a growing interest in who tracks us, and many folks are restricting the use of web cookies and Flash to cut down how advertisers (and others) can track them. Those things are fine as far as they go, but some sites are using the ETag header as an identifier: Attentive readers might have noticed already how you can use this to track people: the browser sends the information back to the server that it previously received (the ETag). That sounds an awful lot like cookies, doesn't it? The server can simply give each browser an unique ETag, and when they connect again it can look it up in its database. Neither JavaScript, nor any other plugin, has to be enabled for this to work either, and changing your IP is useless as well. The only usable workaround seems to be clearing one's cache, or using private browsing with HTTPS on sites where you don't want to be tracked. The Firefox add-on SecretAgent also does ETag overwriting."

0
Your rating: None
Original author: 
Todd Hoff

It's not often you get so enthusiastic a recommendation for a paper as Sergio Bossa gives Memory Barriers: a Hardware View for Software Hackers: If you only want to read one piece about CPUs architecture, cache coherency and memory barriers, make it this one.

It is a clear and well written article. It even has a quiz. What's it about?

So what possessed CPU designers to cause them to inflict memory barriers on poor unsuspecting SMP software designers?

In short, because reordering memory references allows much better performance, and so memory barriers are needed to force ordering in things like synchronization primitives whose correct operation depends on ordered memory references.

Getting a more detailed answer to this question requires a good understanding of how CPU caches work, and especially what is required to make caches really work well. The following sections:

  1. present the structure of a cache,
  2. describe how cache-coherency protocols ensure that CPUs agree on the value of each location in memory, and, finally,
  3. outline how store buffers and invalidate queues help caches and cache-coherency protocols achieve high performance.

We will see that memory barriers are a necessary evil that is required to enable good performance and scalability, an evil that stems from the fact that CPUs are orders of magnitude faster than are both the interconnects between them and the memory they are attempting to access.

0
Your rating: None
Original author: 
Todd Hoff

When you have a large population of servers you have both the opportunity and the incentive to perform interesting studies. Authors from Google and the University of California in Optimizing Google’s Warehouse Scale Computers: The NUMA Experience conducted such a study, taking a look at how jobs run on clusters of machines using a NUMA architecture. Since NUMA is common on server class machines it's a topic of general interest for those looking to maximize machine utilization across clusters.

Some of the results are surprising:

0
Your rating: None

If you are reading this subreddit, you are probably familiar with asymptotic algorithmic complexity (the "big-O notation"). In my experience with this topic, an algorithm's time complexity is usually derived assuming a simple machine model:

  • any "elementary" operation on any data takes one unit of time
  • at most one operation can be done in a unit of time

I have read a little about algorithmic complexity of parallel algorithms where at most P operations per unit time are allowed (P = number of processors) and this seems to be a straightforward extension of the machine model above.

These machine models, however, ignore the latency of transferring the data to the processor. For example, a dot product of two arrays and linear search of a linked list are both O(N) when using the usual machine model. When data transfer latency is taken into account, I would say the dot product is still O(N), since the "names" (addresses) of the data are known well ahead of time they are used and thus any data transfer delays can be overlapped (i.e. pipelined). In a linked list traversal, however, the name of the datum for the next operation is unknown until after the previous operation completes; hence I would say the algorithmic complexity would be O(N \times T(N)), where T(n) is the transfer latency of an arbitrary element from a set of n data elements.

I think this (or similar) machine model can be useful, since data transfer latency is an important consideration for many problems with large input sets.

I realize that these machine models have probably been already proposed and studied and I would greatly appreciate any pointers in this direction.

EDIT: It turns out I was right: this idea has been proposed as early as 1987, and, in fact, following the citations, it seems like there's a lot of follow up work.

submitted by baddeed
[link] [17 comments]

0
Your rating: None