- ACM Queue
- Adam Leventhal
- Apple
- Apple iPad
- ASIC
- ASIC
- block device
- block devices
- block-access protocols
- block-access protocols
- bus protocols
- bus protocols
- caching
- California
- CompactFlash
- Computer hardware
- Computer memory
- Computing
- computing
- consumer products
- David S. H. Rosenthal
- die-manufacturing process
- Digital Camera
- digital cameras
- disk drive
- electron-based trapped-charge storage media
- embedded processor
- error-correction hardware
- file systems
- flash
- Flash file system
- flash media
- Flash memory
- flash memory
- Gianfranco Putzolu
- high-level block-access protocols
- I/O
- impractical storage technology
- impractical storage technology
- integrated circuit
- iPhone
- iPod
- Jim Gray
- less complex device
- level block-access protocols
- lightweight protocol
- lightweight protocol
- Linux
- locality-based algorithms
- magnetic storage media
- manufacturing technology
- Marshall Kirk McKusick
- memory technologies
- memory technologies
- Michael Cornwell
- Microsoft
- mobile devices
- NAND chip
- Non-volatile memory
- nonvolatile memory
- Open NAND Flash Interface Working Group
- operating system
- performance-critical applications
- Phase-change memory
- physical media
- Quantum Corporation
- RAID
- RAM
- RAM chips
- SCSI
- smartphones
- Solid-state drive
- SRAM
- standalone processing
- storage device
- storage devices
- storage media
- storage systems
- storage technologies
- storage technologies
- Sun Microsystems
- system-on-a-chip
- system-on-a-chip
- Technology
- TRIM
- United States
- University of California at Santa Cruz
- Write amplification

If you are reading this subreddit, you are probably familiar with asymptotic algorithmic complexity (the "big-O notation"). In my experience with this topic, an algorithm's time complexity is usually derived assuming a simple machine model:

- any "elementary" operation on any data takes one unit of time
- at most one operation can be done in a unit of time

I have read a little about algorithmic complexity of parallel algorithms where at most P operations per unit time are allowed (P = number of processors) and this seems to be a straightforward extension of the machine model above.

These machine models, however, ignore the latency of transferring the data to the processor. For example, a dot product of two arrays and linear search of a linked list are both O(N) when using the usual machine model. When data transfer latency is taken into account, I would say the dot product is still O(N), since the "names" (addresses) of the data are known well ahead of time they are used and thus any data transfer delays can be overlapped (i.e. pipelined). In a linked list traversal, however, the name of the datum for the next operation is unknown until after the previous operation completes; hence I would say the algorithmic complexity would be O(N \times T(N)), where T(*n*) is the transfer latency of an arbitrary element from a set of *n* data elements.

I think this (or similar) machine model can be useful, since data transfer latency is an important consideration for many problems with large input sets.

I realize that these machine models have probably been already proposed and studied and I would greatly appreciate any pointers in this direction.

**EDIT:** It turns out I was right: this idea has been proposed as early as 1987, and, in fact, following the citations, it seems like there's a lot of follow up work.

submitted by baddeed

[link] [17 comments]

- Analysis of algorithms
- Big O notation
- binary search
- Cache
- cache memory
- CAS latency
- Complexity
- Computational complexity theory
- Computer memory
- CPU cache
- cryptography
- dot product
- Dynamic random access memory
- Hierarchy
- Latency
- linear search
- machine learning
- matrix multiplication algorithms
- matrix multiplication algorithms
- Memory hierarchy
- parallel algorithms
- RAM
- random access
- recursive binary search
- search parameters
- SRAM
- Technology
- transfer algorithm
- two algorithms