| <?xml version="1.0" encoding="UTF-8" standalone="no"?> |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Design</title><meta name="generator" content="DocBook XSL-NS Stylesheets V1.77.1" /><meta name="keywords" content="C++, library, parallel" /><meta name="keywords" content="ISO C++, library" /><meta name="keywords" content="ISO C++, runtime, library" /><link rel="home" href="../index.html" title="The GNU C++ Library" /><link rel="up" href="parallel_mode.html" title="Chapter 18. Parallel Mode" /><link rel="prev" href="parallel_mode_using.html" title="Using" /><link rel="next" href="parallel_mode_test.html" title="Testing" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Design</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="parallel_mode_using.html">Prev</a> </td><th width="60%" align="center">Chapter 18. Parallel Mode</th><td width="20%" align="right"> <a accesskey="n" href="parallel_mode_test.html">Next</a></td></tr></table><hr /></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="manual.ext.parallel_mode.design"></a>Design</h2></div></div></div><p> |
| </p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="parallel_mode.design.intro"></a>Interface Basics</h3></div></div></div><p> |
| All parallel algorithms are intended to have signatures that are |
| equivalent to the ISO C++ algorithms replaced. For instance, the |
| <code class="function">std::adjacent_find</code> function is declared as: |
| </p><pre class="programlisting"> |
| namespace std |
| { |
| template<typename _FIter> |
| _FIter |
| adjacent_find(_FIter, _FIter); |
| } |
| </pre><p> |
| Which means that there should be something equivalent for the parallel |
| version. Indeed, this is the case: |
| </p><pre class="programlisting"> |
| namespace std |
| { |
| namespace __parallel |
| { |
| template<typename _FIter> |
| _FIter |
| adjacent_find(_FIter, _FIter); |
| |
| ... |
| } |
| } |
| </pre><p>But.... why the ellipses? |
| </p><p> The ellipses in the example above represent additional overloads |
| required for the parallel version of the function. These additional |
| overloads are used to dispatch calls from the ISO C++ function |
| signature to the appropriate parallel function (or sequential |
| function, if no parallel functions are deemed worthy), based on either |
| compile-time or run-time conditions. |
| </p><p> The available signature options are specific for the different |
| algorithms/algorithm classes.</p><p> The general view of overloads for the parallel algorithms look like this: |
| </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>ISO C++ signature</p></li><li class="listitem"><p>ISO C++ signature + sequential_tag argument</p></li><li class="listitem"><p>ISO C++ signature + algorithm-specific tag type |
| (several signatures)</p></li></ul></div><p> Please note that the implementation may use additional functions |
| (designated with the <code class="code">_switch</code> suffix) to dispatch from the |
| ISO C++ signature to the correct parallel version. Also, some of the |
| algorithms do not have support for run-time conditions, so the last |
| overload is therefore missing. |
| </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="parallel_mode.design.tuning"></a>Configuration and Tuning</h3></div></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="parallel_mode.design.tuning.omp"></a>Setting up the OpenMP Environment</h4></div></div></div><p> |
| Several aspects of the overall runtime environment can be manipulated |
| by standard OpenMP function calls. |
| </p><p> |
| To specify the number of threads to be used for the algorithms globally, |
| use the function <code class="function">omp_set_num_threads</code>. An example: |
| </p><pre class="programlisting"> |
| #include <stdlib.h> |
| #include <omp.h> |
| |
| int main() |
| { |
| // Explicitly set number of threads. |
| const int threads_wanted = 20; |
| omp_set_dynamic(false); |
| omp_set_num_threads(threads_wanted); |
| |
| // Call parallel mode algorithms. |
| |
| return 0; |
| } |
| </pre><p> |
| Some algorithms allow the number of threads being set for a particular call, |
| by augmenting the algorithm variant. |
| See the next section for further information. |
| </p><p> |
| Other parts of the runtime environment able to be manipulated include |
| nested parallelism (<code class="function">omp_set_nested</code>), schedule kind |
| (<code class="function">omp_set_schedule</code>), and others. See the OpenMP |
| documentation for more information. |
| </p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="parallel_mode.design.tuning.compile"></a>Compile Time Switches</h4></div></div></div><p> |
| To force an algorithm to execute sequentially, even though parallelism |
| is switched on in general via the macro <code class="constant">_GLIBCXX_PARALLEL</code>, |
| add <code class="classname">__gnu_parallel::sequential_tag()</code> to the end |
| of the algorithm's argument list. |
| </p><p> |
| Like so: |
| </p><pre class="programlisting"> |
| std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag()); |
| </pre><p> |
| Some parallel algorithm variants can be excluded from compilation by |
| preprocessor defines. See the doxygen documentation on |
| <code class="code">compiletime_settings.h</code> and <code class="code">features.h</code> for details. |
| </p><p> |
| For some algorithms, the desired variant can be chosen at compile-time by |
| appending a tag object. The available options are specific to the particular |
| algorithm (class). |
| </p><p> |
| For the "embarrassingly parallel" algorithms, there is only one "tag object |
| type", the enum _Parallelism. |
| It takes one of the following values, |
| <code class="code">__gnu_parallel::parallel_tag</code>, |
| <code class="code">__gnu_parallel::balanced_tag</code>, |
| <code class="code">__gnu_parallel::unbalanced_tag</code>, |
| <code class="code">__gnu_parallel::omp_loop_tag</code>, |
| <code class="code">__gnu_parallel::omp_loop_static_tag</code>. |
| This means that the actual parallelization strategy is chosen at run-time. |
| (Choosing the variants at compile-time will come soon.) |
| </p><p> |
| For the following algorithms in general, we have |
| <code class="code">__gnu_parallel::parallel_tag</code> and |
| <code class="code">__gnu_parallel::default_parallel_tag</code>, in addition to |
| <code class="code">__gnu_parallel::sequential_tag</code>. |
| <code class="code">__gnu_parallel::default_parallel_tag</code> chooses the default |
| algorithm at compiletime, as does omitting the tag. |
| <code class="code">__gnu_parallel::parallel_tag</code> postpones the decision to runtime |
| (see next section). |
| For all tags, the number of threads desired for this call can optionally be |
| passed to the respective tag's constructor. |
| </p><p> |
| The <code class="code">multiway_merge</code> algorithm comes with the additional choices, |
| <code class="code">__gnu_parallel::exact_tag</code> and |
| <code class="code">__gnu_parallel::sampling_tag</code>. |
| Exact and sampling are the two available splitting strategies. |
| </p><p> |
| For the <code class="code">sort</code> and <code class="code">stable_sort</code> algorithms, there are |
| several additional choices, namely |
| <code class="code">__gnu_parallel::multiway_mergesort_tag</code>, |
| <code class="code">__gnu_parallel::multiway_mergesort_exact_tag</code>, |
| <code class="code">__gnu_parallel::multiway_mergesort_sampling_tag</code>, |
| <code class="code">__gnu_parallel::quicksort_tag</code>, and |
| <code class="code">__gnu_parallel::balanced_quicksort_tag</code>. |
| Multiway mergesort comes with the two splitting strategies for multi-way |
| merging. The quicksort options cannot be used for <code class="code">stable_sort</code>. |
| </p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="parallel_mode.design.tuning.settings"></a>Run Time Settings and Defaults</h4></div></div></div><p> |
| The default parallelization strategy, the choice of specific algorithm |
| strategy, the minimum threshold limits for individual parallel |
| algorithms, and aspects of the underlying hardware can be specified as |
| desired via manipulation |
| of <code class="classname">__gnu_parallel::_Settings</code> member data. |
| </p><p> |
| First off, the choice of parallelization strategy: serial, parallel, |
| or heuristically deduced. This corresponds |
| to <code class="code">__gnu_parallel::_Settings::algorithm_strategy</code> and is a |
| value of enum <span class="type">__gnu_parallel::_AlgorithmStrategy</span> |
| type. Choices |
| include: <span class="type">heuristic</span>, <span class="type">force_sequential</span>, |
| and <span class="type">force_parallel</span>. The default is <span class="type">heuristic</span>. |
| </p><p> |
| Next, the sub-choices for algorithm variant, if not fixed at compile-time. |
| Specific algorithms like <code class="function">find</code> or <code class="function">sort</code> |
| can be implemented in multiple ways: when this is the case, |
| a <code class="classname">__gnu_parallel::_Settings</code> member exists to |
| pick the default strategy. For |
| example, <code class="code">__gnu_parallel::_Settings::sort_algorithm</code> can |
| have any values of |
| enum <span class="type">__gnu_parallel::_SortAlgorithm</span>: <span class="type">MWMS</span>, <span class="type">QS</span>, |
| or <span class="type">QS_BALANCED</span>. |
| </p><p> |
| Likewise for setting the minimal threshold for algorithm |
| parallelization. Parallelism always incurs some overhead. Thus, it is |
| not helpful to parallelize operations on very small sets of |
| data. Because of this, measures are taken to avoid parallelizing below |
| a certain, pre-determined threshold. For each algorithm, a minimum |
| problem size is encoded as a variable in the |
| active <code class="classname">__gnu_parallel::_Settings</code> object. This |
| threshold variable follows the following naming scheme: |
| <code class="code">__gnu_parallel::_Settings::[algorithm]_minimal_n</code>. So, |
| for <code class="function">fill</code>, the threshold variable |
| is <code class="code">__gnu_parallel::_Settings::fill_minimal_n</code>, |
| </p><p> |
| Finally, hardware details like L1/L2 cache size can be hardwired |
| via <code class="code">__gnu_parallel::_Settings::L1_cache_size</code> and friends. |
| </p><p> |
| </p><p> |
| All these configuration variables can be changed by the user, if |
| desired. |
| There exists one global instance of the class <code class="classname">_Settings</code>, |
| i. e. it is a singleton. It can be read and written by calling |
| <code class="code">__gnu_parallel::_Settings::get</code> and |
| <code class="code">__gnu_parallel::_Settings::set</code>, respectively. |
| Please note that the first call return a const object, so direct manipulation |
| is forbidden. |
| See <a class="link" href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a01005.html" target="_top"> |
| <code class="filename">settings.h</code></a> |
| for complete details. |
| </p><p> |
| A small example of tuning the default: |
| </p><pre class="programlisting"> |
| #include <parallel/algorithm> |
| #include <parallel/settings.h> |
| |
| int main() |
| { |
| __gnu_parallel::_Settings s; |
| s.algorithm_strategy = __gnu_parallel::force_parallel; |
| __gnu_parallel::_Settings::set(s); |
| |
| // Do work... all algorithms will be parallelized, always. |
| |
| return 0; |
| } |
| </pre></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="parallel_mode.design.impl"></a>Implementation Namespaces</h3></div></div></div><p> One namespace contain versions of code that are always |
| explicitly sequential: |
| <code class="code">__gnu_serial</code>. |
| </p><p> Two namespaces contain the parallel mode: |
| <code class="code">std::__parallel</code> and <code class="code">__gnu_parallel</code>. |
| </p><p> Parallel implementations of standard components, including |
| template helpers to select parallelism, are defined in <code class="code">namespace |
| std::__parallel</code>. For instance, <code class="function">std::transform</code> from <code class="filename">algorithm</code> has a parallel counterpart in |
| <code class="function">std::__parallel::transform</code> from <code class="filename">parallel/algorithm</code>. In addition, these parallel |
| implementations are injected into <code class="code">namespace |
| __gnu_parallel</code> with using declarations. |
| </p><p> Support and general infrastructure is in <code class="code">namespace |
| __gnu_parallel</code>. |
| </p><p> More information, and an organized index of types and functions |
| related to the parallel mode on a per-namespace basis, can be found in |
| the generated source documentation. |
| </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="parallel_mode_using.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="parallel_mode.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="parallel_mode_test.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Using </td><td width="20%" align="center"><a accesskey="h" href="../index.html">Home</a></td><td width="40%" align="right" valign="top"> Testing</td></tr></table></div></body></html> |