ems-typed
Version:
Persistent Shared Memory and Parallel Programming Model
493 lines (430 loc) • 20.6 kB
HTML
<html>
<!-- --------------------------------------------------------------------------+
| Extended Memory Semantics (EMS) Version 1.6.1 |
| http://mogill.com/ jace@mogill.com |
+-----------------------------------------------------------------------------+
| Copyright (c) 2011-2014, Synthetic Semantics LLC. All rights reserved. |
| Copyright (c) 2015-2020, Jace A Mogill. All rights reserved. |
| |
| Redistribution and use in source and binary forms, with or without |
| modification, are permitted provided that the following conditions are met: |
| * Redistributions of source code must retain the above copyright |
| notice, this list of conditions and the following disclaimer. |
| * Redistributions in binary form must reproduce the above copyright |
| notice, this list of conditions and the following disclaimer in the |
| documentation and/or other materials provided with the distribution. |
| * Neither the name of the Synthetic Semantics nor the names of its |
| contributors may be used to endorse or promote products derived |
| from this software without specific prior written permission. |
| |
| THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS |
| "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT |
| LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR |
| A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SYNTHETIC |
| SEMANTICS LLC BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, |
| EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, |
| PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR |
| PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF |
| LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING |
| NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS |
| SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
| |
+-------------------------------------------------------------------------- -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="./docs.css">
<link rel="icon" href="favicon.ico" type="image/x-icon">
<link rel="shortcut icon" href="favicon.ico" type="image/x-icon">
<title>Extended Memory Semantics -- Overview</title>
</head>
<body>
<div style="font-family:Gotthard; font-size: 40px; vertical-align:middle; margin-left: 1%">
<a href="http://mogill.com">
<span style="vertical-align:middle;">Extended Memory Semantics</span>
</a>
</div>
<div style="padding-left:3%; font-size:1.2em;">
<a href="index.html"> Overview of EMS </a>
|
<a href="reference.html"> API Reference </a>
|
<a href="https://www.npmjs.org/package/ems"> Node.js NPM </a>
|
<a href="https://github.com/mogill/ems"> Download at GitHub </a>
</div>
<h1>
Extended Memory Semantics
</h1>
<p class="pmulti" >
<span style="font-size: 1.1em; font-weight:bold; font-style:italic;">
Extended Memory Semantics (EMS) complements serial programming
models with
transactional and other fine-grained synchronization capabilities
to support parallel programming.
</span>
<br><br>
Much of the challenge in implementing distributed and parallel
programs derives from finding, marshaling, and synchronizing
data. Extended Memory Semantics (EMS) unifies these tasks into a
single programming and execution model. EMS implements a shared
address space with a rich set of primitives for parallel access of data
structures. It is not a source of parallelism itself, instead it
complements other parallel programming models and integrates shared
memory data access and synchronization.
<br><br>
EMS leverages existing tool chains instead of replacing them and is
compatible with legacy applications, libraries, frameworks,
operating systems, and hardware.
Because EMS represents
memory and not processes, it may persist independently of any
application, and it's state may be replicated, archived, or forked.
Applications may attach and detach from the memory in much the
same way applications use a shared database or filesystem.
</p>
<P>
<h6>Synchronization As a Property of the Data, Not a Duty for Tasks
</h6>
<p class="pmulti">
EMS internally stores tags that are used for synchronization of
user data, allowing synchronization to happen independently of
the number or kind of processes accessing the data. The tags
can be thought of as being in one of three states, <em>Empty,
Full,</em> or <em>Read-Only</em>, and the EMS primitives enforce
atomic access through automatic state transitions.
<br>
<img style="clear:both; height:200px; margin-left: 30px;"
src="./fsmSimple.svg" type="image/svg+xml" />
<span class="figcaption">
<br><br> EMS Data Tag Transitions & Atomic operations:
F=Full, E=Empty, X=Don't Care, RW=Readers-Writer lock (# of current readers)
CAS=Compare-and-Swap, FAA=Fetch-and-Add
</span>
<br><br>
The function name <code>readFE</code> means "Read when full and mark empty",
<code>writeEF</code> means "Write when empty and mark full",
<code>writeXF</code> means "Write unconditionally and mark
full", etc. In the most simple case, full-empty tags are used
to block readers until memory is marked full by a writer thread
that itself blocks until the memory is marked empty. This is
effectively a dataflow or producer-consumer execution model that
may involve any number of producers and consumers.
<br><br>
<img style="height:120px; margin-left: 30px;"
src="./memLayoutLogical.svg" type="image/svg+xml" />
<span class="figcaption">
<br><br>EMS memory is an array of JSON primitive values
(Number, Boolean, String, or Undefined) accessed using atomic
operators and/or transactional memory. Safe parallel access
is managed by passing through multiple gates: First mapping a
key to an index, then accessing user data protected by EMS
tags, and completing the whole operation atomically.
</span>
<br><br>
The EMS array may be indexed directly using an integer, or using a key
mapping of any primitive type. When a map is used, the key and data
itself are updated atomically.
<br><br>
The full-empty primitives are
used construct other thread-safe data types such as
atomic read-modify-write operations and Transactional Memory (TM).
</p>
<h3>
Principles of Operation
</h3>
<p class="pmulti" >
When the <code>require('ems')(...)</code> statement is executed by a program,
EMS first creates a shared memory region to rendezvous and
communicate with other EMS threads, then,
using the built-in
<img src="./nodejs.svg" type="image/svg+xml" height="16px" style="vertical-align:text-top;" />
fork primitives,
creates the additional threads executing
using one of two execution models: fork-join or Bulk Synchronous Parallel (BSP).
BSP invokes the same script as the master thread (found in <code>process.argv[2]</code>),
whereas fork-join execution invokes parallel region around a function.
<br><br>
Under BSP, all threads execute the entire program unless statements are explicitly skipped.
Fork-join parallelism has a single entry point and executes sequentially
until a parallel region is started with <code>ems.parallel( func )</code>.
<br>
<img style="width:200px; margin-left: 80px;" src="./BSPvsForkJoin.svg" type="image/svg+xml" />
<span class="figcaption"><br>
Fork-Join parallelism follows the traditional single-threaded execution
model until a parallel region where threads are dynamically added to
perform iterations of a loop. Under BSP parallelism every thread
enters the program at the main entry point.
</span>
<BR><BR>
Fork-join creates parallel regions
much like OpenMP's <code>#pragma omp parallel</code> directive.
Under BSP, all threads enter the main program and execute all
statements, synchronizing at barriers.
<BR><BR>
In addition to ordinary sequential loops,
within a parallel region <code>ems.parForEach( func )</code>
loops distribute iterations among the threads
using several load balancing scheduling options.
<br><br>
The master thread preserves all the characteristics and capabilities of
an ordinary
<img src="./nodejs.svg" type="image/svg+xml" height="16px" style="vertical-align:text-top;"/>
job, and all legacy applications, modules, packages, frameworks, and test apparatus
will work normally. Software that does not use EMS is not affected by it's presence.
<br><br>
Atomic operations like compare-and-swap (CAS) and fetch-and-add (FAA)
that are typically non-blocking will block if the full/empty
tag is set to empty.
Stack/queue operators are deadlock free,
blocking operations and should be thought of as thread-safe but not concurrent.
EMS transactions are also deadlock free and support element-level locking
for the highest possible currency.
<br><br>
Dataflow programs directly manipulating the full/empty tags
may deadlock if a program attempts to re-acquire a lock
already held, or acquire locks in a different order than other threads.
<br><br>
EMS programs may be run with any number of threads, including
single threaded and over-subscribed.
</P>
<br>
<center>
<img src="./blockDiagram.svg" type="image/svg+xml" width="90%" style="vertical-align:text-top;"/>
<span class="figcaption">
<br><br>
A logical overview of what program statements cause threads to be created
and how shared data is referenced.
</span>
</center>
<h3>Performance</h3>
<p>
These experiments were run on an Amazon EC2 instance:<br>
<code>cr1.8xlarge: 244 GiB memory, 88 EC2 Compute Units, 240 GB of local instance storage, 64-bit platform, 10 Gigabit Ethernet</code>
</p>
<p class="pmulti" >
<img src="tm_from_q.svg" type="image/svg+xml" height="260px" style="vertical-align:text-top;"/>
<span class="figcaption"><br>
Using the general transactional memory capabilities to process
randomly generated operations stored in a shared work queue.
</span>
<br><br><br><br>
<img src="./TMfromLoop.png" type="image/svg+xml" height="260px" style="vertical-align:text-top;"/>
<span class="figcaption"><br>
Transaction processing but generating the work from a loop instead of reading it from a shared queue.
</span>
<br><br><br><br>
<img src="./ccab.png" type="image/svg+xml" height="260px" style="vertical-align:text-top;"/>
<span class="figcaption"><br>
Perform the operation <code>c[i] = c[i] + a[i] * b[i]</code> atomically
</span>
<br><br><br><br>
<img src="./wordcount.svg" type="image/svg+xml" height="260px" style="vertical-align:text-top;"/>
<span class="figcaption"><br>
Word Count of documents from Project Gutenberg in a variety of languages. Average document was
about 250kb in length.
</span>
</p>
<h3>Built-In Composed Operations and Parallel Data Structures</h3>
<p>
High-level data abstractions can be constructed from the EMS primitives,
and EMS itself
composes the primitives to implement transactional memory (TM), stacks, and queues.
User defined composed operations can be added to EMS classes just as new methods
are added to other JavaScript objects.
</p>
<h6>Transactional Memory</h6>
<P>
Transactional Memory (TM) provides atomic access to multiple shared objects
in a manner similar to transactional databases. EMS implements mutual exclusion
on specific data elements using the Full/Empty tags,
and shared read-only access with a multiple readers-single writer tag.
</p>
<h6>Stacks and Queues</h6>
<P>
Parallel-safe stacks and queues are built-in intrinsics based on Full/Empty tags.
Stacks and queues are by definition serial data structures and do
not support any concurrency. Although highly efficient, a shared
resource like these can become a hot-spot when dozens of threads compete
for access.
</p>
<h5>
Types of Parallelism
</h5>
<P>
<center>
<figure>
<img src="./typesOfParallelism.svg" type="image/svg+xml" width="500px" />
<figcaption><br>
EMS Data Tag Transitions - The four data element states
and the intrinsic EMS atomic operations to transition
between them.
</figcaption>
</figure>
</center>
</p>
<h5>
Why Shared Memory Parallelism?
</h5>
<P>
<center>
<figure>
<img src="./timelines.svg" type="image/svg+xml" width="100%" style="max-width:900px;"/>
<figcaption><br>
Multithreading complements other forms of parallelism and can be
combined with other forms of concurrency for multiplicative benefits.
</figcaption>
</figure>
</center>
</p>
<h5>
Contrary Notions of Strong & Weak Scaling
</h5>
<P>
<center>
<table width="90%" style="border:0px solid #303030; border-collapse:collapse; padding-bottom:24%;">
<tr>
<td style="border:0px; vertical-align:top;" align="top" width="20%">
</TD>
<td style="border:0px solid #303030; vertical-align:top;" align="top" width="40%">
<center style="float:top; font-size:1.3em;">
<b>Strong Scaling</b> <br>
<img src="./strong_scaling.svg" type="image/svg+xml" width="300px"
style="border:0; box-shadow: 0px 0px 0px rgba(0,0,0,0);" />
</center>
</TD>
<td style="border:0px solid #303030; vertical-align:top;" align="top" width="40%">
<center style="float:top; font-size:1.3em;">
<b>Weak Scaling</b> <br>
<img src="./weak_scaling.svg" type="image/svg+xml" width="300px"
style="border:0; box-shadow: 0px 0px 0px rgba(0,0,0,0);" />
</center>
</td>
</tr>
<tr style="background-color:rgba(0,0,150,.2);">
<td style="vertical-align:middle; text-align:right; font-size:1.3em; padding-right:25px;">
<em>Scaling Goal</em>
</TD>
<td style="vertical-align:middle; padding-top:1%; padding-bottom:1%; padding-left:2%;">
Solve the same problem, only faster
</TD>
<td style="vertical-align:middle; padding-top:1%; padding-bottom:1%; padding-left:2%;">
Solve a bigger problem in the same amount of time
</TD>
</tr>
<tr style="background-color:rgba(0,100,0,.2);">
<td style="vertical-align:middle; text-align:right; font-size:1.3em; padding-right:25px;">
<em>Problem Size</em>
</TD>
<td style="vertical-align:middle; padding-top:1%; padding-bottom:1%; padding-left:2%;">
Stays constant while number of processors increases
</TD>
<td style="vertical-align:middle; padding-top:1%; padding-bottom:1%; padding-left:2%;">
Grows with the number of processors
</TD>
</tr>
<tr style="background-color:rgba(0,0,150,.2);">
<td style="vertical-align:middle; text-align:right; font-size:1.3em; padding-right:25px;">
<em>Scaling is limited by</em>
</TD>
<td style="vertical-align:middle; padding-top:1%; padding-bottom:1%; padding-left:2%;">
Inter-process communication
</TD>
<td style="vertical-align:middle; padding-top:1%; padding-bottom:1%; padding-left:2%;">
Problem size
</TD>
</tr>
<tr style="background-color:rgba(0,100,0,.2);">
<td style="vertical-align:middle; text-align:right; font-size:1.3em; padding-right:25px;">
<em>Resiliency</em>
</TD>
<td style="vertical-align:middle; padding-top:1%; padding-bottom:1%; padding-left:2%;">
Single failure causes entire job to fail,
SLAs achieved through efficient checkpoint-restart.
</TD>
<td style="vertical-align:middle; padding-top:1%; padding-bottom:1%; padding-left:2%;">
Failed sub-tasks are detected and retried. SLAs achieved through fault resiliency.
</TD>
</tr>
<tr style="background-color:rgba(0,0,150,.2);">
<td style="vertical-align:middle; text-align:right; font-size:1.3em; padding-right:25px;">
<em>Programming Models</em>
</TD>
<td style="vertical-align:middle; padding-top:1%; padding-bottom:1%; padding-left:2%;">
MPI, GASNet, Chapel, X10, Co-Array Fortran, UPC
</TD>
<td style="vertical-align:middle; padding-top:1%; padding-bottom:1%; padding-left:2%;">
Batch jobs, Map-Reduce
</TD>
</tr>
</table>
</center>
</p>
<h5>
Historical Precedents for Data-Centric Multithreading
</h5>
<p class="pmulti">
EMS builds on three experimental computer architectures from the
1980's: the NYU Ultracomputer, the MIT J-Machine, and the Tera Multi-Threaded Architecture (MTA).
Specifically, the Ultra introduced combining networks as a basic
architectural feature,
the J-Machine made moving a task to data as easy as moving data to the processor,
and the MTA used massive multithreading to mask latency and
had fine-grained synchronization associated with the data, not tasks.
</P>
<h3>
Bugs
</h3>
<P>
The old data argument for CAS requires a deep compare -- this is possible but
entirely compatible with JS semantics which compares type, not contents.
<br><br>
Similarly, FAA does not implement add on object type data, but extending the
semantics of JS would make this sensible
</P>
<h3>
Roadmap
</h3>
<h6>
Languages and APIs
</h6>
<P>
In addition to JavaScript, Python, and C/C++, EMS can be added to other languages.
Languages that can share the EMS API:
<br> - Scala
<br> - PHP
<br> - Java
<br> - Perl
<br> - Your name here
</P>
<h6>
Examples, Benchmarks, Tests
</h6>
<P>
Other programs included with the distribution for demonstration, test, and benchmarking purposes.
<br> - Matrix multiply
<br> - Graph500
<br> - Sobel Filter
<br> - MongoDB Replica Server
</P>
<h6>
RiverTrail Extensions
</h6>
<P>
A few years ago a proposed set of language extensions called RiverTrail
was being supported by Intel and was implemented as a Firefox Plug-In.
The hooks used by the extension are no longer supported in Firefox, and
the future of RiverTrail is unclear. To the extent possible, EMS should
implement RiverTrail extensions.
</P>
<hr>
<div style="font-size:.85em;">
This browsing experience is Copyright ©2014-2020,
<span style="font-family:Gotthard;">
<a href="http://mogill.com">
Jace Mogill</a>
</span>.
Proudly designed and built in Cascadia.
<img src="./Flag_of_Cascadia.svg" type="image/svg+xml" height="20px" style="vertical-align:middle;">
</div>
</body>
</html>