<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://ascend4.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Arash</id>
	<title>ASCEND - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://ascend4.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Arash"/>
	<link rel="alternate" type="text/html" href="https://ascend4.org/Special:Contributions/Arash"/>
	<updated>2026-04-28T22:00:18Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.43.6</generator>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=3024</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=3024"/>
		<updated>2011-08-21T11:51:04Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* compiler_bincuda.qrcuda/mwcol/bcacol */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
&amp;lt;!-- *** Store the mapping information into fast texture constant memory --&amp;gt;&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
&amp;lt;!-- *** support for models containing &#039;external relations&#039;--&amp;gt;&lt;br /&gt;
** Prepare a &amp;lt;!-- multi platform --&amp;gt; Makefile to compile and build BinCUDAs&lt;br /&gt;
&amp;lt;!-- ** Complete the external functions in “btcudapl.cu”--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable in mass balance mode.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** More clean-ups in the BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranch|arash|models/test/bintok/bincuda/test2.a4c}}, more testing is required).&lt;br /&gt;
** QRCUDA was tested with {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}} and after some bug fixes, the GPU evaluator results are now identical to the same results achieved from standard calc_residuals method.&lt;br /&gt;
* After 26-June&lt;br /&gt;
** The testcase was modified to solve the distillation model in both mass balance and energy balance mode.&lt;br /&gt;
** Performance analyses with valgrind and gprof.&lt;br /&gt;
** Bug fix in PyGTK so now the system is re-analyzed after execution of the methods.&lt;br /&gt;
** QRCUDA solved its first large model (31733 equations) in mass balance and energy balance mode, the results are identical to the QRSlv results. Both solvers are converged and the self_test method was executed without any error ({{srcbranch|arash|models/test/bintok/bincuda/mwcolumn.a4c}}).&lt;br /&gt;
* After 6-July&lt;br /&gt;
** QRCUDA was integrated to PyGTK.&lt;br /&gt;
** ASCEND&#039;s standard parameter handling mechanism was used in QRCUDA. &lt;br /&gt;
** The functionality added in QRCUDA that reports GPU block evaluation timing to PyGTK.&lt;br /&gt;
** Extensive search carried out to create large and solvable models (larger than current 30000). During this search, QRCUDA was tested with different models and several bugs were identified in QRCUDA and fixed.&lt;br /&gt;
** The next step is to create GPU-based line search.&lt;br /&gt;
* After 16-July&lt;br /&gt;
** The heuristic formula for multi-vector residual evaluator is defined (Armijo rule)&lt;br /&gt;
** Research on different variation of Armijo rule was completed and I decided to use (0.5) as the coefficient, the main reason behind this decision is that we can calculate (0.5) ^N with a combination of shift-left and divide operators which has a great performance advantage over any other coefficient. &lt;br /&gt;
* After 26-July&lt;br /&gt;
** The evaluator kernels are converted from 2D kernels into 3D kernels (the extra dimension is used for input vectors created with Armijo rule)&lt;br /&gt;
** A parallel kernel was implemented to calculate square normal of residuals&lt;br /&gt;
*** The normal calculator is extended to calculate the minimum square normal value for multi vector evaluation&lt;br /&gt;
** A unit test created for testing multi vector evaluators&lt;br /&gt;
* After 6-August &lt;br /&gt;
** Concurrent kernel launcher (streaming) is implemented for residual evaluator kernels. (In a model with ~80000 relations, the evaluators are now executed 4x faster compared to the  previous version that used sequential kernel launcher)&lt;br /&gt;
** Multi-vector evaluators were tested and the results were identical to normal CPU based evaluators&lt;br /&gt;
** Multi-vector evaluators were integrated to the line-search algorithm of QRCUDA&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directory - i.e. /usr/local/cuda &amp;lt;!-- and ~/NVIDIA_GPU_Computing_SDK). --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
&amp;lt;!-- wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run --&amp;gt;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- 5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I REMOVED THESE STEPS AS NOW THE SDK SAMPLE DOWNLOAD IS NOT REQUIRED.. arash&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranch|arash|models/test/bintok/bincuda/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the tests ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test suite was prepared to test the QRCUDA solver and the generated CUDA model evaluator objects (i.e. BinCUDAs). The test suite code is located in test_bincuda.c({{srcbranch|arash|ascend/compiler/test/test_bincuda.c}}) and contains six test functions; gen, satpnt, multivec, qrcuda, mwcol and bcacol.&lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.[test function name]&amp;quot; at the top level ASCEND directory.&lt;br /&gt;
For more information about how QRCUDA and BinCUDAs are interacting please refer to ({{srcbranch|arash|ascend/bintokens/bincuda/BinCUDA_Readme.txt}}).&lt;br /&gt;
To change the current benchmark model, you can change the macro DEF_FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranch|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
=== compiler_bincuda.gen === &lt;br /&gt;
This test function outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile ({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
=== compiler_bincuda.satpnt ===&lt;br /&gt;
&lt;br /&gt;
In the multi-vector residual evaluator, the model is concurrently evaluated for multiple input vectors. As the GPU parallel architecture is used, the evaluation time for multiple inputs is equal to the evaluation time for a single input. The &amp;quot;satpnt&amp;quot; test function is responsible for determining the &#039;&#039;saturation point&#039;&#039; for a specific model. We define the saturation point as the maximum number of vectors where the computational time for concurrent residual evaluation is equal to that time measured for a single input vector evaluation.&lt;br /&gt;
&lt;br /&gt;
Please note that this test function is only measuring the computational time and the time for data transfer between CPU and GPU is not provided in the results.&lt;br /&gt;
&lt;br /&gt;
=== compiler_bincuda.multivec ===&lt;br /&gt;
&lt;br /&gt;
In the &amp;quot;multivec&amp;quot; test function, the results achieved from the multi vector evaluators is verified against the standard CPU based implementation provided in ASCEND framework and then the computational performance of multivector evaluators are measured.&lt;br /&gt;
&lt;br /&gt;
=== compiler_bincuda.qrcuda/mwcol/bcacol ===&lt;br /&gt;
&lt;br /&gt;
These test functions are solving {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}}, {{srcbranch|arash|models/test/bintok/bincuda/mwcolumn.a4c}} and {{srcbranch|arash|models/test/bintok/bincuda/bcacolumn.a4c}} respectively.&lt;br /&gt;
[[Category:GSOC2011]][[Category:ASCEND Contributors]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=3023</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=3023"/>
		<updated>2011-08-21T11:41:28Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Running the test */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
&amp;lt;!-- *** Store the mapping information into fast texture constant memory --&amp;gt;&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
&amp;lt;!-- *** support for models containing &#039;external relations&#039;--&amp;gt;&lt;br /&gt;
** Prepare a &amp;lt;!-- multi platform --&amp;gt; Makefile to compile and build BinCUDAs&lt;br /&gt;
&amp;lt;!-- ** Complete the external functions in “btcudapl.cu”--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable in mass balance mode.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** More clean-ups in the BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranch|arash|models/test/bintok/bincuda/test2.a4c}}, more testing is required).&lt;br /&gt;
** QRCUDA was tested with {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}} and after some bug fixes, the GPU evaluator results are now identical to the same results achieved from standard calc_residuals method.&lt;br /&gt;
* After 26-June&lt;br /&gt;
** The testcase was modified to solve the distillation model in both mass balance and energy balance mode.&lt;br /&gt;
** Performance analyses with valgrind and gprof.&lt;br /&gt;
** Bug fix in PyGTK so now the system is re-analyzed after execution of the methods.&lt;br /&gt;
** QRCUDA solved its first large model (31733 equations) in mass balance and energy balance mode, the results are identical to the QRSlv results. Both solvers are converged and the self_test method was executed without any error ({{srcbranch|arash|models/test/bintok/bincuda/mwcolumn.a4c}}).&lt;br /&gt;
* After 6-July&lt;br /&gt;
** QRCUDA was integrated to PyGTK.&lt;br /&gt;
** ASCEND&#039;s standard parameter handling mechanism was used in QRCUDA. &lt;br /&gt;
** The functionality added in QRCUDA that reports GPU block evaluation timing to PyGTK.&lt;br /&gt;
** Extensive search carried out to create large and solvable models (larger than current 30000). During this search, QRCUDA was tested with different models and several bugs were identified in QRCUDA and fixed.&lt;br /&gt;
** The next step is to create GPU-based line search.&lt;br /&gt;
* After 16-July&lt;br /&gt;
** The heuristic formula for multi-vector residual evaluator is defined (Armijo rule)&lt;br /&gt;
** Research on different variation of Armijo rule was completed and I decided to use (0.5) as the coefficient, the main reason behind this decision is that we can calculate (0.5) ^N with a combination of shift-left and divide operators which has a great performance advantage over any other coefficient. &lt;br /&gt;
* After 26-July&lt;br /&gt;
** The evaluator kernels are converted from 2D kernels into 3D kernels (the extra dimension is used for input vectors created with Armijo rule)&lt;br /&gt;
** A parallel kernel was implemented to calculate square normal of residuals&lt;br /&gt;
*** The normal calculator is extended to calculate the minimum square normal value for multi vector evaluation&lt;br /&gt;
** A unit test created for testing multi vector evaluators&lt;br /&gt;
* After 6-August &lt;br /&gt;
** Concurrent kernel launcher (streaming) is implemented for residual evaluator kernels. (In a model with ~80000 relations, the evaluators are now executed 4x faster compared to the  previous version that used sequential kernel launcher)&lt;br /&gt;
** Multi-vector evaluators were tested and the results were identical to normal CPU based evaluators&lt;br /&gt;
** Multi-vector evaluators were integrated to the line-search algorithm of QRCUDA&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directory - i.e. /usr/local/cuda &amp;lt;!-- and ~/NVIDIA_GPU_Computing_SDK). --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
&amp;lt;!-- wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run --&amp;gt;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- 5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I REMOVED THESE STEPS AS NOW THE SDK SAMPLE DOWNLOAD IS NOT REQUIRED.. arash&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranch|arash|models/test/bintok/bincuda/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the tests ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test suite was prepared to test the QRCUDA solver and the generated CUDA model evaluator objects (i.e. BinCUDAs). The test suite code is located in test_bincuda.c({{srcbranch|arash|ascend/compiler/test/test_bincuda.c}}) and contains six test functions; gen, satpnt, multivec, qrcuda, mwcol and bcacol.&lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.[test function name]&amp;quot; at the top level ASCEND directory.&lt;br /&gt;
For more information about how QRCUDA and BinCUDAs are interacting please refer to ({{srcbranch|arash|ascend/bintokens/bincuda/BinCUDA_Readme.txt}}).&lt;br /&gt;
To change the current benchmark model, you can change the macro DEF_FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranch|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
=== compiler_bincuda.gen === &lt;br /&gt;
This test function outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile ({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
=== compiler_bincuda.satpnt ===&lt;br /&gt;
&lt;br /&gt;
In the multi-vector residual evaluator, the model is concurrently evaluated for multiple input vectors. As the GPU parallel architecture is used, the evaluation time for multiple inputs is equal to the evaluation time for a single input. The &amp;quot;satpnt&amp;quot; test function is responsible for determining the &#039;&#039;saturation point&#039;&#039; for a specific model. We define the saturation point as the maximum number of vectors where the computational time for concurrent residual evaluation is equal to that time measured for a single input vector evaluation.&lt;br /&gt;
&lt;br /&gt;
Please note that this test function is only measuring the computational time and the time for data transfer between CPU and GPU is not provided in the results.&lt;br /&gt;
&lt;br /&gt;
=== compiler_bincuda.multivec ===&lt;br /&gt;
&lt;br /&gt;
In the &amp;quot;multivec&amp;quot; test function, the results achieved from the multi vector evaluators is verified against the standard CPU based implementation provided in ASCEND framework and then the computational performance of multivector evaluators are measured.&lt;br /&gt;
&lt;br /&gt;
=== compiler_bincuda.qrcuda/mwcol/bcacol ===&lt;br /&gt;
&lt;br /&gt;
These test functions are solving {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}}, {{srcbranch|arash|models/test/bintok/bincuda/mwcolumn.a4c}} and {{srcbranch|arash|models/test/bintok/bincuda/bcacolumn.a4c}} respectively.&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2998</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2998"/>
		<updated>2011-08-18T11:55:52Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Progress */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
&amp;lt;!-- *** Store the mapping information into fast texture constant memory --&amp;gt;&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
&amp;lt;!-- *** support for models containing &#039;external relations&#039;--&amp;gt;&lt;br /&gt;
** Prepare a &amp;lt;!-- multi platform --&amp;gt; Makefile to compile and build BinCUDAs&lt;br /&gt;
&amp;lt;!-- ** Complete the external functions in “btcudapl.cu”--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable in mass balance mode.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** More clean-ups in the BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranch|arash|models/test/bintok/bincuda/test2.a4c}}, more testing is required).&lt;br /&gt;
** QRCUDA was tested with {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}} and after some bug fixes, the GPU evaluator results are now identical to the same results achieved from standard calc_residuals method.&lt;br /&gt;
* After 26-June&lt;br /&gt;
** The testcase was modified to solve the distillation model in both mass balance and energy balance mode.&lt;br /&gt;
** Performance analyses with valgrind and gprof.&lt;br /&gt;
** Bug fix in PyGTK so now the system is re-analyzed after execution of the methods.&lt;br /&gt;
** QRCUDA solved its first large model (31733 equations) in mass balance and energy balance mode, the results are identical to the QRSlv results. Both solvers are converged and the self_test method was executed without any error ({{srcbranch|arash|models/test/bintok/bincuda/mwcolumn.a4c}}).&lt;br /&gt;
* After 6-July&lt;br /&gt;
** QRCUDA was integrated to PyGTK.&lt;br /&gt;
** ASCEND&#039;s standard parameter handling mechanism was used in QRCUDA. &lt;br /&gt;
** The functionality added in QRCUDA that reports GPU block evaluation timing to PyGTK.&lt;br /&gt;
** Extensive search carried out to create large and solvable models (larger than current 30000). During this search, QRCUDA was tested with different models and several bugs were identified in QRCUDA and fixed.&lt;br /&gt;
** The next step is to create GPU-based line search.&lt;br /&gt;
* After 16-July&lt;br /&gt;
** The heuristic formula for multi-vector residual evaluator is defined (Armijo rule)&lt;br /&gt;
** Research on different variation of Armijo rule was completed and I decided to use (0.5) as the coefficient, the main reason behind this decision is that we can calculate (0.5) ^N with a combination of shift-left and divide operators which has a great performance advantage over any other coefficient. &lt;br /&gt;
* After 26-July&lt;br /&gt;
** The evaluator kernels are converted from 2D kernels into 3D kernels (the extra dimension is used for input vectors created with Armijo rule)&lt;br /&gt;
** A parallel kernel was implemented to calculate square normal of residuals&lt;br /&gt;
*** The normal calculator is extended to calculate the minimum square normal value for multi vector evaluation&lt;br /&gt;
** A unit test created for testing multi vector evaluators&lt;br /&gt;
* After 6-August &lt;br /&gt;
** Concurrent kernel launcher (streaming) is implemented for residual evaluator kernels. (In a model with ~80000 relations, the evaluators are now executed 4x faster compared to the  previous version that used sequential kernel launcher)&lt;br /&gt;
** Multi-vector evaluators were tested and the results were identical to normal CPU based evaluators&lt;br /&gt;
** Multi-vector evaluators were integrated to the line-search algorithm of QRCUDA&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directory - i.e. /usr/local/cuda &amp;lt;!-- and ~/NVIDIA_GPU_Computing_SDK). --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
&amp;lt;!-- wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run --&amp;gt;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- 5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I REMOVED THESE STEPS AS NOW THE SDK SAMPLE DOWNLOAD IS NOT REQUIRED.. arash&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranch|arash|models/test/bintok/bincuda/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranch|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranch|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;br /&gt;
[[Category:ASCEND Contributors]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2985</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2985"/>
		<updated>2011-08-12T11:19:23Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Progress */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
&amp;lt;!-- *** Store the mapping information into fast texture constant memory --&amp;gt;&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
&amp;lt;!-- *** support for models containing &#039;external relations&#039;--&amp;gt;&lt;br /&gt;
** Prepare a &amp;lt;!-- multi platform --&amp;gt; Makefile to compile and build BinCUDAs&lt;br /&gt;
&amp;lt;!-- ** Complete the external functions in “btcudapl.cu”--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable in mass balance mode.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** More clean-ups in the BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranch|arash|models/test/bintok/bincuda/test2.a4c}}, more testing is required).&lt;br /&gt;
** QRCUDA was tested with {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}} and after some bug fixes, the GPU evaluator results are now identical to the same results achieved from standard calc_residuals method.&lt;br /&gt;
* After 26-June&lt;br /&gt;
** The testcase was modified to solve the distillation model in both mass balance and energy balance mode.&lt;br /&gt;
** Performance analyses with valgrind and gprof.&lt;br /&gt;
** Bug fix in PyGTK so now the system is re-analyzed after execution of the methods.&lt;br /&gt;
** QRCUDA solved its first large model (31733 equations) in mass balance and energy balance mode, the results are identical to the QRSlv results. Both solvers are converged and the self_test method was executed without any error ({{srcbranch|arash|models/test/bintok/bincuda/mwcolumn.a4c}}).&lt;br /&gt;
* After 6-July&lt;br /&gt;
** QRCUDA was integrated to PyGTK.&lt;br /&gt;
** ASCEND&#039;s standard parameter handling mechanism was used in QRCUDA. &lt;br /&gt;
** The functionality added in QRCUDA that reports GPU block evaluation timing to PyGTK.&lt;br /&gt;
** Extensive search carried out to create large and solvable models (larger than current 30000). During this search, QRCUDA was tested with different models and several bugs were identified in QRCUDA and fixed.&lt;br /&gt;
** The next step is to create GPU-based line search.&lt;br /&gt;
* After 16-July&lt;br /&gt;
** The heuristic formula for multi-vector residual evaluator is defined (Armijo rule)&lt;br /&gt;
** Research on different variation of Armijo rule was completed and I decided to use (0.5) as the coefficient, the main reason behind this decision is that we can calculate (0.5) ^N with a combination of shift-left and divide operators which has a great performance advantage over any other coefficient. &lt;br /&gt;
* After 26-July&lt;br /&gt;
** The evaluator kernels are converted from 2D kernels into 3D kernels (the extra dimension is used for input vectors created with Armijo rule)&lt;br /&gt;
** A parallel kernel was implemented to calculate square normal of residuals&lt;br /&gt;
*** The normal calculator is extended to calculate the minimum square normal value for multi vector evaluation&lt;br /&gt;
** A unit test created for testing multi vector evaluators&lt;br /&gt;
* After 6-August &lt;br /&gt;
** Concurrent kernel launcher (streaming) is implemented for residual evaluator kernels. (In a model with ~80000 relations, the evaluators are now executed 4x faster compared to the  previous version that used sequential kernel launcher)&lt;br /&gt;
** Multi-vector evaluators were tested and the results were identical to normal CPU based evaluators&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directory - i.e. /usr/local/cuda &amp;lt;!-- and ~/NVIDIA_GPU_Computing_SDK). --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
&amp;lt;!-- wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run --&amp;gt;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- 5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I REMOVED THESE STEPS AS NOW THE SDK SAMPLE DOWNLOAD IS NOT REQUIRED.. arash&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranch|arash|models/test/bintok/bincuda/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranch|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranch|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;br /&gt;
[[Category:ASCEND Contributors]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2971</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2971"/>
		<updated>2011-08-07T11:40:45Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Progress */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
&amp;lt;!-- *** Store the mapping information into fast texture constant memory --&amp;gt;&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
&amp;lt;!-- *** support for models containing &#039;external relations&#039;--&amp;gt;&lt;br /&gt;
** Prepare a &amp;lt;!-- multi platform --&amp;gt; Makefile to compile and build BinCUDAs&lt;br /&gt;
&amp;lt;!-- ** Complete the external functions in “btcudapl.cu”--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable in mass balance mode.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** More clean-ups in the BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranch|arash|models/test/bintok/bincuda/test2.a4c}}, more testing is required).&lt;br /&gt;
** QRCUDA was tested with {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}} and after some bug fixes, the GPU evaluator results are now identical to the same results achieved from standard calc_residuals method.&lt;br /&gt;
* After 26-June&lt;br /&gt;
** The testcase was modified to solve the distillation model in both mass balance and energy balance mode.&lt;br /&gt;
** Performance analyses with valgrind and gprof.&lt;br /&gt;
** Bug fix in PyGTK so now the system is re-analyzed after execution of the methods.&lt;br /&gt;
** QRCUDA solved its first large model (31733 equations) in mass balance and energy balance mode, the results are identical to the QRSlv results. Both solvers are converged and the self_test method was executed without any error ({{srcbranch|arash|models/test/bintok/bincuda/mwcolumn.a4c}}).&lt;br /&gt;
* After 6-July&lt;br /&gt;
** QRCUDA was integrated to PyGTK.&lt;br /&gt;
** ASCEND&#039;s standard parameter handling mechanism was used in QRCUDA. &lt;br /&gt;
** The functionality added in QRCUDA that reports GPU block evaluation timing to PyGTK.&lt;br /&gt;
** Extensive search carried out to create large and solvable models (larger than current 30000). During this search, QRCUDA was tested with different models and several bugs were identified in QRCUDA and fixed.&lt;br /&gt;
** The next step is to create GPU-based line search.&lt;br /&gt;
* After 16-July&lt;br /&gt;
** The heuristic formula for multi-vector residual evaluator is defined (Armijo rule)&lt;br /&gt;
** Research on different variation of Armijo rule was completed and I decided to use (0.5) as the coefficient, the main reason behind this decision is that we can calculate (0.5) ^N with a combination of shift-left and divide operators which has a great performance advantage over any other coefficient. &lt;br /&gt;
* After 26-July&lt;br /&gt;
** The evaluator kernels are converted from 2D kernels into 3D kernels (the extra dimension is used for input vectors created with Armijo rule)&lt;br /&gt;
** A parallel kernel was implemented to calculate square normal of residuals&lt;br /&gt;
*** The normal calculator is extended to calculate the minimum square normal value for multi vector evaluation&lt;br /&gt;
** A unit test created for testing multi vector evaluators&lt;br /&gt;
* After 6-August &lt;br /&gt;
** Concurrent kernel launcher (streaming) is implemented for residual evaluator kernels. (In a model with ~80000 relations, the evaluators are now executed 4x faster compared to the  previous version that used sequential kernel launcher)&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directory - i.e. /usr/local/cuda &amp;lt;!-- and ~/NVIDIA_GPU_Computing_SDK). --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
&amp;lt;!-- wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run --&amp;gt;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- 5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I REMOVED THESE STEPS AS NOW THE SDK SAMPLE DOWNLOAD IS NOT REQUIRED.. arash&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranch|arash|models/test/bintok/bincuda/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranch|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranch|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;br /&gt;
[[Category:ASCEND Contributors]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2946</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2946"/>
		<updated>2011-08-04T05:02:46Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Project Plan */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
&amp;lt;!-- *** Store the mapping information into fast texture constant memory --&amp;gt;&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
&amp;lt;!-- *** support for models containing &#039;external relations&#039;--&amp;gt;&lt;br /&gt;
** Prepare a &amp;lt;!-- multi platform --&amp;gt; Makefile to compile and build BinCUDAs&lt;br /&gt;
&amp;lt;!-- ** Complete the external functions in “btcudapl.cu”--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable in mass balance mode.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** More clean-ups in the BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranch|arash|models/test/bintok/bincuda/test2.a4c}}, more testing is required).&lt;br /&gt;
** QRCUDA was tested with {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}} and after some bug fixes, the GPU evaluator results are now identical to the same results achieved from standard calc_residuals method.&lt;br /&gt;
* After 26-June&lt;br /&gt;
** The testcase was modified to solve the distillation model in both mass balance and energy balance mode.&lt;br /&gt;
** Performance analyses with valgrind and gprof.&lt;br /&gt;
** Bug fix in PyGTK so now the system is re-analyzed after execution of the methods.&lt;br /&gt;
** QRCUDA solved its first large model (31733 equations) in mass balance and energy balance mode, the results are identical to the QRSlv results. Both solvers are converged and the self_test method was executed without any error ({{srcbranch|arash|models/test/bintok/bincuda/mwcolumn.a4c}}).&lt;br /&gt;
* After 6-July&lt;br /&gt;
** QRCUDA was integrated to PyGTK.&lt;br /&gt;
** ASCEND&#039;s standard parameter handling mechanism was used in QRCUDA. &lt;br /&gt;
** The functionality added in QRCUDA that reports GPU block evaluation timing to PyGTK.&lt;br /&gt;
** Extensive search carried out to create large and solvable models (larger than current 30000). During this search, QRCUDA was tested with different models and several bugs were identified in QRCUDA and fixed.&lt;br /&gt;
** The next step is to create GPU-based line search.&lt;br /&gt;
* After 16-July&lt;br /&gt;
** The heuristic formula for multi-vector residual evaluator is defined (Armijo rule)&lt;br /&gt;
** Research on different variation of Armijo rule was completed and I decided to use (0.5) as the coefficient, the main reason behind this decision is that we can calculate (0.5) ^N with a combination of shift-left and divide operators which has a great performance advantage over any other coefficient. &lt;br /&gt;
* After 26-July&lt;br /&gt;
** The evaluator kernels are converted from 2D kernels into 3D kernels (the extra dimension is used for input vectors created with Armijo rule)&lt;br /&gt;
** A parallel kernel was implemented to calculate square normal of residuals&lt;br /&gt;
*** The normal calculator is extended to calculate the minimum square normal value for multi vector evaluation&lt;br /&gt;
** A unit test created for testing multi vector evaluators&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directory - i.e. /usr/local/cuda &amp;lt;!-- and ~/NVIDIA_GPU_Computing_SDK). --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
&amp;lt;!-- wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run --&amp;gt;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- 5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I REMOVED THESE STEPS AS NOW THE SDK SAMPLE DOWNLOAD IS NOT REQUIRED.. arash&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranch|arash|models/test/bintok/bincuda/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranch|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranch|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2942</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2942"/>
		<updated>2011-08-02T12:34:05Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Progress */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable in mass balance mode.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** More clean-ups in the BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranch|arash|models/test/bintok/bincuda/test2.a4c}}, more testing is required).&lt;br /&gt;
** QRCUDA was tested with {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}} and after some bug fixes, the GPU evaluator results are now identical to the same results achieved from standard calc_residuals method.&lt;br /&gt;
* After 26-June&lt;br /&gt;
** The testcase was modified to solve the distillation model in both mass balance and energy balance mode.&lt;br /&gt;
** Performance analyses with valgrind and gprof.&lt;br /&gt;
** Bug fix in PyGTK so now the system is re-analyzed after execution of the methods.&lt;br /&gt;
** QRCUDA solved its first large model (31733 equations) in mass balance and energy balance mode, the results are identical to the QRSlv results. Both solvers are converged and the self_test method was executed without any error ({{srcbranch|arash|models/test/bintok/bincuda/mwcolumn.a4c}}).&lt;br /&gt;
* After 6-July&lt;br /&gt;
** QRCUDA was integrated to PyGTK.&lt;br /&gt;
** ASCEND&#039;s standard parameter handling mechanism was used in QRCUDA. &lt;br /&gt;
** The functionality added in QRCUDA that reports GPU block evaluation timing to PyGTK.&lt;br /&gt;
** Extensive search carried out to create large and solvable models (larger than current 30000). During this search, QRCUDA was tested with different models and several bugs were identified in QRCUDA and fixed.&lt;br /&gt;
** The next step is to create GPU-based line search.&lt;br /&gt;
* After 16-July&lt;br /&gt;
** The heuristic formula for multi-vector residual evaluator is defined (Armijo rule)&lt;br /&gt;
** Research on different variation of Armijo rule was completed and I decided to use (0.5) as the coefficient, the main reason behind this decision is that we can calculate (0.5) ^N with a combination of shift-left and divide operators which has a great performance advantage over any other coefficient. &lt;br /&gt;
* After 26-July&lt;br /&gt;
** The evaluator kernels are converted from 2D kernels into 3D kernels (the extra dimension is used for input vectors created with Armijo rule)&lt;br /&gt;
** A parallel kernel was implemented to calculate square normal of residuals&lt;br /&gt;
*** The normal calculator is extended to calculate the minimum square normal value for multi vector evaluation&lt;br /&gt;
** A unit test created for testing multi vector evaluators&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directory - i.e. /usr/local/cuda &amp;lt;!-- and ~/NVIDIA_GPU_Computing_SDK). --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
&amp;lt;!-- wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run --&amp;gt;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- 5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I REMOVED THESE STEPS AS NOW THE SDK SAMPLE DOWNLOAD IS NOT REQUIRED.. arash&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranch|arash|models/test/bintok/bincuda/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranch|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranch|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2863</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2863"/>
		<updated>2011-07-14T03:43:55Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Test models */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable in mass balance mode.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** More clean-ups in the BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranch|arash|models/test/bintok/bincuda/test2.a4c}}, more testing is required).&lt;br /&gt;
** QRCUDA was tested with {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}} and after some bug fixes, the GPU evaluator results are now identical to the same results achieved from standard calc_residuals method.&lt;br /&gt;
* After 26-June&lt;br /&gt;
** The testcase was modified to solve the distillation model in both mass balance and energy balance mode.&lt;br /&gt;
** Performance analyses with valgrind and gprof.&lt;br /&gt;
** Bug fix in PyGTK so now the system is re-analyzed after execution of the methods.&lt;br /&gt;
** QRCUDA solved its first large model (31733 equations) in mass balance and energy balance mode, the results are identical to the QRSlv results. Both solvers are converged and the self_test method was executed without any error ({{srcbranch|arash|models/test/bintok/bincuda/mwcolumn.a4c}}).&lt;br /&gt;
* After 6-July&lt;br /&gt;
** QRCUDA was integrated to PyGTK.&lt;br /&gt;
** ASCEND&#039;s standard parameter handling mechanism was used in QRCUDA. &lt;br /&gt;
** The functionality added in QRCUDA that reports GPU block evaluation timing to PyGTK.&lt;br /&gt;
** Extensive search carried out to create large and solvable models (larger than current 30000). During this search, QRCUDA was tested with different models and several bugs were identified in QRCUDA and fixed.&lt;br /&gt;
** The next step is to create GPU-based line search.&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directory - i.e. /usr/local/cuda &amp;lt;!-- and ~/NVIDIA_GPU_Computing_SDK). --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
&amp;lt;!-- wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run --&amp;gt;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- 5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I REMOVED THESE STEPS AS NOW THE SDK SAMPLE DOWNLOAD IS NOT REQUIRED.. arash&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranch|arash|models/test/bintok/bincuda/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranch|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranch|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2862</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2862"/>
		<updated>2011-07-14T03:41:58Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Progress */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable in mass balance mode.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** More clean-ups in the BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranch|arash|models/test/bintok/bincuda/test2.a4c}}, more testing is required).&lt;br /&gt;
** QRCUDA was tested with {{srcbranch|arash|models/test/bintok/bincuda/larg_distil.a4c}} and after some bug fixes, the GPU evaluator results are now identical to the same results achieved from standard calc_residuals method.&lt;br /&gt;
* After 26-June&lt;br /&gt;
** The testcase was modified to solve the distillation model in both mass balance and energy balance mode.&lt;br /&gt;
** Performance analyses with valgrind and gprof.&lt;br /&gt;
** Bug fix in PyGTK so now the system is re-analyzed after execution of the methods.&lt;br /&gt;
** QRCUDA solved its first large model (31733 equations) in mass balance and energy balance mode, the results are identical to the QRSlv results. Both solvers are converged and the self_test method was executed without any error ({{srcbranch|arash|models/test/bintok/bincuda/mwcolumn.a4c}}).&lt;br /&gt;
* After 6-July&lt;br /&gt;
** QRCUDA was integrated to PyGTK.&lt;br /&gt;
** ASCEND&#039;s standard parameter handling mechanism was used in QRCUDA. &lt;br /&gt;
** The functionality added in QRCUDA that reports GPU block evaluation timing to PyGTK.&lt;br /&gt;
** Extensive search carried out to create large and solvable models (larger than current 30000). During this search, QRCUDA was tested with different models and several bugs were identified in QRCUDA and fixed.&lt;br /&gt;
** The next step is to create GPU-based line search.&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directory - i.e. /usr/local/cuda &amp;lt;!-- and ~/NVIDIA_GPU_Computing_SDK). --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
&amp;lt;!-- wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run --&amp;gt;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- 5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I REMOVED THESE STEPS AS NOW THE SDK SAMPLE DOWNLOAD IS NOT REQUIRED.. arash&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranch|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranch|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranch|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranch|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2846</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2846"/>
		<updated>2011-07-13T13:48:16Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Progress */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable in mass balance mode.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** More clean-ups in the BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranch|arash|models/test/bintok/test2.a4c}}, more testing is required).&lt;br /&gt;
** QRCUDA was tested with {{srcbranch|arash|models/test/bintok/larg_distil.a4c}} and after some bug fixes, the GPU evaluator results are now identical to the same results achieved from standard calc_residuals method.&lt;br /&gt;
* After 26-June&lt;br /&gt;
** The testcase was modified to solve the distillation model in both mass balance and energy balance mode.&lt;br /&gt;
** Performance analyses with valgrind and gprof.&lt;br /&gt;
** Bug fix in PyGTK so now the system is re-analyzed after execution of the methods.&lt;br /&gt;
** QRCUDA solved its first large model (31733 equations) in mass balance and energy balance mode, the results are identical to the QRSlv results. Both solvers are converged and the self_test method was executed without any error.&lt;br /&gt;
* After 6-July&lt;br /&gt;
** QRCUDA was integrated to PyGTK.&lt;br /&gt;
** ASCEND&#039;s standard parameter handling mechanism was used in QRCUDA. &lt;br /&gt;
** The functionality added in QRCUDA that reports GPU block evaluation timing to PyGTK.&lt;br /&gt;
** Extensive search carried out to create large and solvable models (larger than current 30000). During this search, QRCUDA was tested with different models and several bugs were identified in QRCUDA and fixed.&lt;br /&gt;
** The next step is to create GPU-based line search.&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directory - i.e. /usr/local/cuda &amp;lt;!-- and ~/NVIDIA_GPU_Computing_SDK). --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
&amp;lt;!-- wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run --&amp;gt;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- 5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I REMOVED THESE STEPS AS NOW THE SDK SAMPLE DOWNLOAD IS NOT REQUIRED.. arash&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranch|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranch|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranch|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranch|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2788</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2788"/>
		<updated>2011-07-05T02:01:14Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Installing CUDA SDK on Linux */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable in mass balance mode.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** More clean-ups in the BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranch|arash|models/test/bintok/test2.a4c}}, more testing is required).&lt;br /&gt;
** QRCUDA was tested with {{srcbranch|arash|models/test/bintok/larg_distil.a4c}} and after some bug fixes, the GPU evaluator results are now identical to the same results achieved from standard calc_residuals method.&lt;br /&gt;
* After 26-June&lt;br /&gt;
** The testcase was modified to solve the distillation model in both mass balance and energy balance mode.&lt;br /&gt;
** Performance analyses with valgrind and gprof.&lt;br /&gt;
** Bug fix in PyGTK so now the system is re-analyzed after execution of the methods.&lt;br /&gt;
** QRCUDA solved its first large model(31733 equations) in mass balance and energy balance mode, the results are identical to the QRSlv results. Both solvers are converged and the self_test method is executed without any error.&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directory - i.e. /usr/local/cuda &amp;lt;!-- and ~/NVIDIA_GPU_Computing_SDK). --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
&amp;lt;!-- wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run --&amp;gt;&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- 5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I REMOVED THESE STEPS AS NOW THE SDK SAMPLE DOWNLOAD IS NOT REQUIRED.. arash&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranch|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranch|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranch|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranch|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2787</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2787"/>
		<updated>2011-07-05T01:48:16Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Progress */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable in mass balance mode.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** More clean-ups in the BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranch|arash|models/test/bintok/test2.a4c}}, more testing is required).&lt;br /&gt;
** QRCUDA was tested with {{srcbranch|arash|models/test/bintok/larg_distil.a4c}} and after some bug fixes, the GPU evaluator results are now identical to the same results achieved from standard calc_residuals method.&lt;br /&gt;
* After 26-June&lt;br /&gt;
** The testcase was modified to solve the distillation model in both mass balance and energy balance mode.&lt;br /&gt;
** Performance analyses with valgrind and gprof.&lt;br /&gt;
** Bug fix in PyGTK so now the system is re-analyzed after execution of the methods.&lt;br /&gt;
** QRCUDA solved its first large model(31733 equations) in mass balance and energy balance mode, the results are identical to the QRSlv results. Both solvers are converged and the self_test method is executed without any error.&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranch|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranch|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranch|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranch|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranch|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=Publications&amp;diff=2756</id>
		<title>Publications</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=Publications&amp;diff=2756"/>
		<updated>2011-06-29T05:53:37Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Conference presentations and papers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Note: not all the PDF links are present in the following text. If a link is missing, please using the original [http://www.ascend4.org/ascend_bibliography.htm ASCEND bibliography] page to find the document.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
In date order, divided according to category.&lt;br /&gt;
&lt;br /&gt;
== Doctoral theses ==&lt;br /&gt;
&lt;br /&gt;
V Rico-Ramirez, &#039;&#039;Representation, Analysis and Solution of Conditional Models in an Equation-Based Environment&#039;&#039;, Ph.D. Thesis, Carnegie Mellon University,1998. [http://ascend.cheme.cmu.edu/ftp/pdfThesis/victhesis.pdf PDF]&lt;br /&gt;
&lt;br /&gt;
K A Abbott, &#039;&#039;Very Large Scale Modeling&#039;&#039; Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, 1996. [http://ascend.cheme.cmu.edu/ftp/pdfThesis/kirkthesis.pdf PDF]&lt;br /&gt;
&lt;br /&gt;
B A Allan, &#039;&#039;A More Reusable Modeling System&#039;&#039;, Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, 1997. [http://ascend.cheme.cmu.edu/ftp/pdfThesis/benthesis.pdf PDF]&lt;br /&gt;
&lt;br /&gt;
B T Safrit, &#039;&#039;Synthesis of Azeotropic Batch Distillation Separation Systems&#039;&#039;, Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, 1996. [http://ascend.cheme.cmu.edu/ftp/pdfThesis/safritthesis.pdf PDF]&lt;br /&gt;
&lt;br /&gt;
M Thomas, &#039;&#039;Tool and Information Management in Engineering Design&#039;&#039;, Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, 1996. [http://ascend.cheme.cmu.edu/ftp/pdfThesis/markthesis.pdf PDF]&lt;br /&gt;
&lt;br /&gt;
R S Huss, &#039;&#039;Collocation Methods For Flexible Distillation Design&#039;&#039;, Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, 1995. [http://ascend.cheme.cmu.edu/ftp/pdfThesis/bobthesis.pdf PDF]&lt;br /&gt;
&lt;br /&gt;
J Zaher, &#039;&#039;Conditional Modeling&#039;&#039;, Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, 1995. [http://ascend.cheme.cmu.edu/ftp/pdfThesis/joethesis.pdf PDF]&lt;br /&gt;
&lt;br /&gt;
P Piela, &#039;&#039;ASCEND: An Object-Oriented Computer Environment for Modeling and Analysis&#039;&#039;, Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, 1989. [http://ascend.cheme.cmu.edu/ftp/htbibliography/b9.html Abstract]&lt;br /&gt;
&lt;br /&gt;
== Journal papers ==&lt;br /&gt;
&lt;br /&gt;
A W Westerberg, 2003, &#039;&#039;A retrospective on design and process synthesis&#039;&#039;, Computers &amp;amp; Chemical Engineering, &#039;&#039;&#039;28&#039;&#039;&#039; (4), 447-458. {{doi|10.1016/j.compchemeng.2003.09.029}}.&lt;br /&gt;
&lt;br /&gt;
V Rico-Ramírez, B A Allan and A W Westerberg, 1999, &#039;&#039;Conditional Modeling. 1. Requirements for an Equation-Based Environment&#039;&#039;, Ind. Eng. Chem. Res., 38 (2), pp 519–530. {{doi|10.1021/ie9800593}}&lt;br /&gt;
&lt;br /&gt;
V Rico-Ramírez and A W Westerberg, 1999, &#039;&#039;Conditional Modeling. 2. Solving Using Complementarity and Boundary-Crossing Formulations&#039;&#039;, Ind. Eng. Chem. Res., 38 (2), pp 531–553. {{doi|10.1021/ie9800602}}&lt;br /&gt;
&lt;br /&gt;
Benjamin A Allan and Arthur W Westerberg, 1999, &#039;&#039;Anonymous Class in Declarative Process Modeling&#039;&#039;, Ind. Eng. Chem. Res., &#039;&#039;&#039;38&#039;&#039;&#039; (3), pp 692–704. {{doi|10.1021/ie980297y}}&lt;br /&gt;
&lt;br /&gt;
Kirk A Abbott, Benjamin A Allan, and Arthur W Westerberg, 1997, &#039;&#039;Global preordering for Newton equations using model hierarchy&#039;&#039;, AIChE Journal &#039;&#039;&#039;43&#039;&#039;&#039; (12), {{doi|10.1002/aic.690431207}}&lt;br /&gt;
&lt;br /&gt;
Robert S Huss and Arthur W Westerberg, 1996, &#039;&#039;Collocation Methods for Distillation Design. 2. Applications for Distillation&#039;&#039;, Ind. Eng. Chem. Res., &#039;&#039;&#039;35&#039;&#039;&#039; (5), pp 1611–1623, {{doi|10.1021/ie9503508}}&lt;br /&gt;
&lt;br /&gt;
B T Safrit, A W Westerberg, U Diwekar, O M Wahnschafft, 1995, &#039;&#039;Extending Continuous Conventional and Extractive Distillation Feasibility Insights to Batch Distillation&#039;&#039;, Ind. Eng. Chem. Res., 34, 3257-3264. {{doi|10.1021/ie00037a012}}&lt;br /&gt;
&lt;br /&gt;
B Katzenberg and P Piela, 1993, &#039;&#039;Work Language Analysis and the Naming Problem&#039;&#039;, Communication of the ACM, Vol. 36, No. 6, 86-92. {{doi|10.1145/153571.163286}}&lt;br /&gt;
&lt;br /&gt;
P Piela, R McKelvey, and A Westerberg, 1993, &#039;&#039;An Introduction to the ASCEND Modeling System: Its Language and Interactive Environment&#039;&#039;, ; Journal of Management Information Systems ,Vol. 9, No.3, 91-121. [http://ascend.cheme.cmu.edu/ftp/htbibliography/b6.html Abstract], [http://www.jstor.org/stable/40398044 Full article]&lt;br /&gt;
&lt;br /&gt;
P Piela, B Katzenberg, and R McKelvey, 1992, &#039;&#039;Integrating the User into Research on Engineering Design Systems ASCEND: An Object-Oriented Computer Environment for Modeling&#039;&#039;, Research in Engineering Design, Vol. 3, 211-221. {{doi|10.1007/BF01580843}}&lt;br /&gt;
&lt;br /&gt;
Oliver J Smith IV and Arthur W Westerberg, 1991, &#039;&#039;The optimal design of pressure swing adsorption systems&#039;&#039;, Chemical Engineering Science, &#039;&#039;&#039;48&#039;&#039; (12), {{doi|10.1016/0009-2509(91)85001-E}}&lt;br /&gt;
&lt;br /&gt;
P Piela, T Epperly, K Westerberg and A Westerberg, 1991, &#039;&#039;ASCEND: An Object-Oriented Computer Environment for Modeling and Analysis: The Modeling Language.&#039;&#039;, Computers and Chemical Engineering, Vol. 15, No. 1, 53-72, . {{doi|10.1016/0098-1354(91)87006-U}}&lt;br /&gt;
&lt;br /&gt;
R F Woodbury, 1990, &#039;&#039;Variations in Solids: A Declarative Treatment&#039;&#039;, Computers and Graphics, Special Issue on Features and Geometric Reasoning, Vol. 14, No. 2, 173-188. {{doi|10.1016/0097-8493(90)90030-2}}&lt;br /&gt;
&lt;br /&gt;
S Kuru and A W Westerberg, 1985, &#039;&#039;A Newton-Raphson based strategy for exploiting latency in dynamic simulation&#039;&#039;, Computers &amp;amp; Chemical Engineering, Vol 9., No. 2, {{doi|10.1016/0098-1354(85)85007-9}} or [[Media:Kuru1985.pdf|PDF]]&lt;br /&gt;
&lt;br /&gt;
A W Westerberg and D R Benajamin, 1985, &#039;&#039;Thoughts on a Future Equation-Oriented Flowsheeting System&#039;&#039;, Computers and Chemical Engineering, Vol. 9, No. 5, 517-526. {{doi|10.1016/0098-1354(85)80026-0}}&lt;br /&gt;
&lt;br /&gt;
Michael H Locke and Arthur W Westerberg, 1983, &#039;&#039;The ascend-II system—a flowsheeting application of a successive quadratic programming methodology&#039;&#039;, Computers &amp;amp; Chemical Engineering &#039;&#039;&#039;7&#039;&#039;&#039; (5) 615-630. {{doi|10.1016/0098-1354(83)80007-6}}&lt;br /&gt;
&lt;br /&gt;
A W Westerberg and S W Director, 1978, &#039;&#039;A modified least squares next term algorithm for solving sparse n × n sets of nonlinear equations&#039;&#039;, Computers and Chemical Engineering, Vol. 2, pp 77-81. {{doi|10.1016/0098-1354(78)80011-8}} (this paper describes the basis of our [[QRSlv]] solver)&lt;br /&gt;
&lt;br /&gt;
== Conference presentations and papers ==&lt;br /&gt;
A Sadrieh, P A Bahri, Application of Graphic Processing Unit in Model Predictive Control, Computer Aided Chemical Engineering, Elsevier, 2011, Volume 29, 21st European Symposium on Computer Aided Process Engineering, Pages 492-496, ISSN 1570-7946, ISBN 9780444538956, DOI: 10.1016/B978-0-444-53711-9.50099-7.&lt;br /&gt;
&lt;br /&gt;
J Pye, K Lovegrove and G Burgess. &#039;&#039;[http://solar-thermal.anu.edu.au/wp-content/uploads/pye-2010-solarpaces-combined-cycle.pdf Combined-cycle solarised gas turbine with steam, organic and CO2 bottoming cycles]&#039;&#039;, Proceedings of SolarPACES 2010, Perpignan, France, Sept 2010.&lt;br /&gt;
&lt;br /&gt;
J Coventry and J Pye, 2009, &#039;&#039;[http://stwp.cecs.anu.edu.au/wp-content/uploads/SolarPACES2009_CoventryPye.pdf Coupling supercritical and superheated direct steam generation with thermal energy storage]&#039;&#039;. SolarPACES 2009, Berlin, 15-18 Sept.&lt;br /&gt;
&lt;br /&gt;
H G Silva and R L R Salcedo, 2006, &#039;&#039;Modeling and Optimization of Chemical Processes: ASCEND IV and Stochastic Optimizers&#039;&#039;, Proceedings of [http://www.actapress.com/Content_Of_Proceeding.aspx?ProceedingID=381 Modelling and Simulation], Montreal, Canada.&lt;br /&gt;
&lt;br /&gt;
Allan, B.A. and A. W. Westerberg, &#039;&#039;Compiling and Solving 100,000 equations on a PC in (3) minutes&#039;&#039; Carnegie Mellon University, Pittsburgh, PA, 1998. Presented April 1998 at INFORMS Montreal conference. [http://ascend.cheme.cmu.edu/ftp/pdfPapersRptsSlides/informs98.pdf PDF].&lt;br /&gt;
&lt;br /&gt;
Bhargava, H., Krishnan, R., and Piela, P., &#039;&#039;Formalizing the Semantics of ASCEND&#039;&#039;, Proceeding of the 27th Hawaii International Conference on the System Sciences, IEEE Computer Society Press, 1994. &lt;br /&gt;
[http://ascend.cheme.cmu.edu/ftp/htbibliography/b1.html Abstract], {{doi|http://dx.doi.org/10.1109/HICSS.1994.323312}}.&lt;br /&gt;
&lt;br /&gt;
Krishnan, R., Piela, P., and Westerberg, A., &#039;&#039;Reusing Mathematical Models in ASCEND&#039;&#039;, Recent Developments in Decision Support Systems, C. W. Holsapple and A. B. Whinston (eds.). Proceedings of the NATO ASI on Decision Support Systems. Springer-Verlag, in cooperation with NATO Scientific Affairs Division, 1993. [http://ascend.cheme.cmu.edu/ftp/htbibliography/b5.html Abstract], read [http://books.google.com.au/books?id=UefgA0jltcgC&amp;amp;pg=PA275 via Google Books].&lt;br /&gt;
&lt;br /&gt;
Westerberg, A.W., Abbott, K.A., and Allan, B.A., &#039;&#039;Plans for ASCEND IV: Our Next Generation Equational-Based Modeling Environment&#039;&#039;, AspenWorld 94, Boston Massachusetts, November 1994. [http://ascend.cheme.cmu.edu/ftp/htbibliography/b10.html Abstract].&lt;br /&gt;
&lt;br /&gt;
Westerberg, A., Piela, P., McKelvey, R., and Epperly, T., &#039;&#039;The ASCEND Modeling Environment and Its Implications&#039;&#039;, Proceedings of the 4th International Symposium on Process Systems Engineering, I: Design, I.2.1 - I.2.12, 1991. [http://ascend.cheme.cmu.edu/ftp/htbibliography/ba.html Abstract].&lt;br /&gt;
&lt;br /&gt;
Westerberg, A., Piela, P., Subrahmanian, E., Podnar, G., and Elm, W., &#039;&#039;A Future Computer Environment for Preliminary Design&#039;&#039;, Proceedings of the Third International Conference on Foundations of Computer Aided Process Design, J. Siirola, I. Grossmann, and G. Stephanopoulos (eds.), 1989. [http://ascend.cheme.cmu.edu/ftp/htbibliography/bb.html Abstract]&lt;br /&gt;
&lt;br /&gt;
== Technical reports ==&lt;br /&gt;
&lt;br /&gt;
L Cisternas,  N Luza, E Gálvez, 2007, &#039;&#039;[[Media:HydroSim-reporte-tecnico.pdf|HydroSim: Una Librería para Simulación, Modelación y Optimización en Hidrometalurgia]]&#039;&#039; (in spanish; english title: &#039;&#039;HydroSim: a library for simulation, modelling and optimisation in hydrometallurgy&#039;&#039;), Technical report, Universidad de Antifagasta, Design of Product and Process Group and CICITEM. &#039;&#039;&#039;PDF&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
J L Perry and B A Allan, 1996, &#039;&#039;[http://ascend.cheme.cmu.edu/ftp/pdfPapersRptsSlides/AscendIVP.pdf Design and Use of Dynamic Modeling in ASCEND IV]&#039;&#039;, Carnegie Mellon University, EDRC Technical Report 06-224-96. &#039;&#039;&#039;PDF&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
P Piela and A W Westerberg, 1994, &#039;&#039;[http://ascend.cheme.cmu.edu/ftp/pdfPapersRptsSlides/processModeling.pdf Equation-Based Process Modeling]&#039;&#039;, Carnegie Mellon University, EDRC Technical Report. &#039;&#039;&#039;PDF&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
A W Westerberg, 1994, &#039;&#039;[http://ascend.cheme.cmu.edu/ftp/pdfPapersRptsSlides/ascendIntro.pdf ASCEND Modeling Language and Environment Notes]&#039;&#039;, CAPD Short Course Notes, Carnegie Mellon University.&lt;br /&gt;
&lt;br /&gt;
T Epperly, 1989, [[Media:epperlyPaperICES05-29-89.pdf|&#039;&#039;Implementation of an ASCEND Interpreter&#039;&#039;]], Carnegie Mellon University, EDRC Technical Report 05-29-89, &#039;&#039;&#039;PDF&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
Allan, B.A. and A. W. Westerberg, &#039;&#039;[http://ascend.cheme.cmu.edu/ftp/pdfPapersRptsSlides/aiche97-201f.pdf Reusability and Scalability in Modeling] (Previously titled &amp;quot;Reusability in Modeling and Modeling Support Environments&amp;quot;)&#039;&#039;, Carnegie Mellon University, ICES Technical Report, November 1997. Pittsburgh, PA, 1997. (AIChE 1997 Meeting Paper 201f.) (see also [http://ascend.cheme.cmu.edu/ftp/pdfPapersRptsSlides/slides-201f.pdf presentation slides])&lt;br /&gt;
&lt;br /&gt;
Allan, B.A. and Westerberg, A.W., &#039;&#039;The ASCEND IV Language Syntax and Semantics&#039;&#039;, Carnegie Mellon University, EDRC Technical Report, 1997. (this report is now edited and incorporated into our [[:Category:Documentation|documentation]]).&lt;br /&gt;
&lt;br /&gt;
Allan, B.A., Rico-Ramirez, V., Thomas, M.,and Tyner, K., &#039;&#039;ASCEND IV: A portable Mathematical Modeling Environment&#039;&#039;, Carnegie Mellon University, ICES Technical Report, October 1996. [http://ascend.cheme.cmu.edu/ftp/pdfHelp/ascend-help-BOOK-3.pdf PDF].&lt;br /&gt;
&lt;br /&gt;
Dee, K. and Westerberg, A., &#039;&#039;CEPHDA: Chemical Engineering Process Hierarchical Design with ASCEND&#039;&#039;, Carnegie Mellon University, EDRC Technical Report 06-140-92, 1992 [http://ascend.cheme.cmu.edu/ftp/htbibliography/b2.html Abstract].&lt;br /&gt;
&lt;br /&gt;
Rico-Ramirez, V., Allan, B.A. and Westerberg, A.W., &#039;&#039;Conditional Modeling in an Equation-Based Environment&#039;&#039;, Carnegie Mellon University, ICES Technical Report 06-242-98, 1998. [http://ascend.cheme.cmu.edu/ftp/pdfPapersRptsSlides/tech_modeling.pdf PDF].&lt;br /&gt;
&lt;br /&gt;
Rico-Ramirez, V. and Westerberg, A.W., &#039;&#039;Complementarity Formulation for the Representation of Algebraic Systems Containing Conditional Equations&#039;&#039;, Carnegie Mellon University, ICES Technical Report 06-243-98, 1998. [http://ascend.cheme.cmu.edu/ftp/pdfPapersRptsSlides/tech_complementarity.pdf PDF].&lt;br /&gt;
&lt;br /&gt;
Westerberg, K., &#039;&#039;Development of Software for Solving Systems of Nonlinear Equations&#039;&#039;, Carnegie Mellon University, EDRC Technical Report 05-36-89, Pittsburgh, PA, 1989. [http://ascend.cheme.cmu.edu/ftp/pdfPapersRptsSlides/nonlinear_rpt.pdf PDF]&lt;br /&gt;
&lt;br /&gt;
Westerberg, K., &#039;&#039;Development of Software for Solving Systems of Linear Equations&#039;&#039;, Carnegie Mellon University, EDRC Technical Report 05-35-89, 1989. [http://ascend.cheme.cmu.edu/ftp/pdfPapersRptsSlides/linear_rpt.pdf PDF]&lt;br /&gt;
&lt;br /&gt;
Zaher, J., &#039;&#039;Developing Reusable Libraries in the ASCEND environment&#039;&#039;, Carnegie Mellon University, EDRC Technical Report 06-108-91, 1991. [http://ascend.cheme.cmu.edu/ftp/htbibliography/bf.html PDF].&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2718</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2718"/>
		<updated>2011-06-24T05:46:02Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Progress */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** More clean-ups in the BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranchdir|arash|models/test/bintok/test2.a4c}}, more testing is required).&lt;br /&gt;
** QRCUDA was tested with {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}} and after some bug fixes, the GPU evaluator results are now identical to the same results achieved from standard calc_residuals method.&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2703</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2703"/>
		<updated>2011-06-22T10:24:54Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Progress */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** More clean-ups in the BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranchdir|arash|ascend/models/test2.a4c}}, more testing is required).&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2702</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2702"/>
		<updated>2011-06-22T10:23:04Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* BinCUDA Makefile settings */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** Clean-up in BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranchdir|arash|ascend/models/test2.a4c}}, more testing is necessary).&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK, the CUDA_INSTALL_PATH variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2701</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2701"/>
		<updated>2011-06-22T10:19:46Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Progress */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
* After 16-June&lt;br /&gt;
** Clean-up in BinCUDAs.&lt;br /&gt;
** The active block evaluation mechanism was added to the batch evaluator.&lt;br /&gt;
** QRCUDA is now using GPU-based model evaluation for the residual evaluation in large blocks (the code was tested on  {{srcbranchdir|arash|ascend/models/test2.a4c}}, more testing is necessary).&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2589</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2589"/>
		<updated>2011-06-15T09:30:03Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Progress */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new information form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
** The Bincuda unload bug was fixed in the clean ups.&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2588</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2588"/>
		<updated>2011-06-15T09:28:35Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Progress */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
* After 6-June&lt;br /&gt;
** Cleanup in the prototype&lt;br /&gt;
** The GPU init and shutdown methods are moved to the QRCUDA.&lt;br /&gt;
** The dependency to the common makefile and headers (located in sdk samples) was removed.&lt;br /&gt;
** The linux version of BinCUDA&#039;s makefile was created (windows and mac/os versions are coming soon).&lt;br /&gt;
** A testcase for QRCUDA was implemented&lt;br /&gt;
** A new information form added to the main GUI that shows some information about current CUDA enable devices in the system (speed, number of cores, max memory, number of multiprocessors ....).&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2571</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2571"/>
		<updated>2011-06-09T12:14:43Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Ideas and Issues */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), 60 % of the total time is consumed in the rel_set_residual() calls. How can we optimize this function?.&lt;br /&gt;
# Can the solver provide cheap feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available?&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2565</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2565"/>
		<updated>2011-06-09T09:44:04Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* To-do list */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
&lt;br /&gt;
== Ideas and Issues ==&lt;br /&gt;
&lt;br /&gt;
A list of ideas and issues with the current implementation is provided as follows (comments and critiques are greatly appreciated):&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), %60 of the total time is consumed in the rel_set_residual() calls. How we can optimize this function?.&lt;br /&gt;
# Can the solver cheaply provide feedback to the user showing the degree of parallelism that was achieved during a particular model solution?&lt;br /&gt;
# Sometimes QRSlv makes use of a Brent solver for blocks with a single equation. Is that the best approach when a GPU is available&lt;br /&gt;
# More large demonstration models are needed. Let&#039;s go and find some.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2564</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2564"/>
		<updated>2011-06-09T09:31:49Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Development&amp;amp;Test Plan */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Project Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
&lt;br /&gt;
== To-do list ==&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), %60 of the total time is consumed in the rel_set_residual() calls. This function should be optimized.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2563</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2563"/>
		<updated>2011-06-09T09:28:54Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Development &amp;amp; Test Plan */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Development&amp;amp;Test Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
&lt;br /&gt;
* Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is first loaded; only make QRCUDA available if the tests succeed, give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
&lt;br /&gt;
== To-do list ==&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), %60 of the total time is consumed in the rel_set_residual() calls. This function should be optimized.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2562</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2562"/>
		<updated>2011-06-09T09:26:09Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Development&amp;amp;Test Plan */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Development&amp;amp;Test Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
&lt;br /&gt;
* Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed&lt;br /&gt;
though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is&lt;br /&gt;
first loaded; only make QRCUDA available if the tests succeed,&lt;br /&gt;
give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
&lt;br /&gt;
== To-do list ==&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), %60 of the total time is consumed in the rel_set_residual() calls. This function should be optimized.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2561</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2561"/>
		<updated>2011-06-09T09:24:29Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Goals */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
GSOC-2011 Goals&lt;br /&gt;
&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
* Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Development&amp;amp;Test Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed&lt;br /&gt;
though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is&lt;br /&gt;
first loaded; only make QRCUDA available if the tests succeed,&lt;br /&gt;
give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
&lt;br /&gt;
== To-do list ==&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), %60 of the total time is consumed in the rel_set_residual() calls. This function should be optimized.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2560</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2560"/>
		<updated>2011-06-09T09:23:40Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Goals */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
&lt;br /&gt;
== Development&amp;amp;Test Plan ==&lt;br /&gt;
* Complete the current prototype.&lt;br /&gt;
** Clear step-by-step instructions allowing a new user to setup and test/use your solver&lt;br /&gt;
** General architecture improvement&lt;br /&gt;
** Move the initialization and shutdown tasks from the unit test to the “QRCUDA.c”.&lt;br /&gt;
** Fix the distillation case study, the current model is unsolvable.&lt;br /&gt;
** Optimise the CUDA code&lt;br /&gt;
*** Change kernels memory access pattern to coalesced access&lt;br /&gt;
*** Store the mapping information into fast texture constant memory&lt;br /&gt;
*** Change Memory management model from standard model to PINNED memory management. This makes the memory transfer between host and device faster.&lt;br /&gt;
** Implement hybrid CPU/GPU based evaluation instead of GPU-based evaluation. By doing this, the CPU can be used for the small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
*** support for models containing &#039;external relations&#039;&lt;br /&gt;
&lt;br /&gt;
** Prepare a multi platform Makefile to compile and build BinCUDAs&lt;br /&gt;
** Complete the external functions in “btcudapl.cu”&lt;br /&gt;
&lt;br /&gt;
* Implement the batch multi-vector residual evaluator&lt;br /&gt;
** Define the heuristic formula for multi-vector residual evaluator&lt;br /&gt;
** Research all of the variations of Armijo&#039;s rule (Grippo et al., 1986)&lt;br /&gt;
** Convert current kernels from 2D kernels into 3D; the extra dimension is used for each input vector.&lt;br /&gt;
** Implement the heuristic formula in the kernels&lt;br /&gt;
** Implement a separate kernel that finds the lowest residuals normal and returns the index of the lowest residual normal&lt;br /&gt;
&lt;br /&gt;
* Integrate the approach to QRCUDA&lt;br /&gt;
** Add  block evaluation feature to batch single-vector evaluator.&lt;br /&gt;
** Modify standard residual/gradient evaluator to use new single-vector evaluator. &lt;br /&gt;
** Integrate batch multi-vector evaluator into QRCUDA line search.&lt;br /&gt;
** Modify current line search algorithm to use the batch multi-vector evaluator.&lt;br /&gt;
** Benchmark the results.&lt;br /&gt;
&lt;br /&gt;
* Integrate  the QRCUDA into the ASCEND GUI.&lt;br /&gt;
** Fix the Bintoken unloading bug&lt;br /&gt;
** Fix Bintoken auto rebuild sensing feature in the PyGTK&lt;br /&gt;
** Add GUI menus and dialogs&lt;br /&gt;
*** ensuring all required user-configurable parameters are exposed&lt;br /&gt;
though the solver API&lt;br /&gt;
*** implement testing of CUDA hardware availability when the solver is&lt;br /&gt;
first loaded; only make QRCUDA available if the tests succeed,&lt;br /&gt;
give user feedback if fails.&lt;br /&gt;
&lt;br /&gt;
*  Test the project with different hardware and software platforms.&lt;br /&gt;
** testing of memory leakage and stability.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
&lt;br /&gt;
== To-do list ==&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), %60 of the total time is consumed in the rel_set_residual() calls. This function should be optimized.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2559</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2559"/>
		<updated>2011-06-09T09:10:21Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Installation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible for managing all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to accelerator_mgr.&lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples accelerator manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
&lt;br /&gt;
== To-do list ==&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), %60 of the total time is consumed in the rel_set_residual() calls. This function should be optimized.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture). The card should have the ability to perform &#039;double&#039; floating point calculations (compute_13+).&lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2482</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2482"/>
		<updated>2011-05-27T12:00:08Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* To-do list */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible for managing all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to accelerator_mgr.&lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples accelerator manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
&lt;br /&gt;
== To-do list ==&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), %60 of the total time is consumed in the rel_set_residual() calls. This function should be optimized.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture) &lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2481</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2481"/>
		<updated>2011-05-27T11:58:53Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Progress */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible for managing all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to accelerator_mgr.&lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples accelerator manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator can now perform hybrid CPU/GPU evaluations so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so it is now solvable.&lt;br /&gt;
&lt;br /&gt;
== To-do list ==&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), %60 of the total time, is consumed in the rel_set_residual() calls. This function should be optimized.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture) &lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2477</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2477"/>
		<updated>2011-05-27T07:53:07Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* To-do list */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible for managing all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to accelerator_mgr.&lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples accelerator manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator now can perform hybrid CPU/GPU evaluation so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so now it is solvable.&lt;br /&gt;
&lt;br /&gt;
== To-do list ==&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), %60 of the total time, is consumed in the rel_set_residual() calls. This function should be optimized.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture) &lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2476</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2476"/>
		<updated>2011-05-27T07:52:10Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Progress */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible for managing all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to accelerator_mgr.&lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples accelerator manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
* After 23-May&lt;br /&gt;
** The GPU memory management model was changed from standard to PINNED. This makes data transferrer between host and device two times faster.&lt;br /&gt;
** Batch evaluator now can perform hybrid CPU/GPU evaluation so that the CPU can be used for small equation groups while the GPU is busy evaluating the large groups.&lt;br /&gt;
** The benchmark model was modified slightly so now it is solvable.&lt;br /&gt;
&lt;br /&gt;
== To-do list ==&lt;br /&gt;
&lt;br /&gt;
# In the batch evaluator (relman.c:relman_batch_eval), %60 of the total time, is consumed in  &lt;br /&gt;
the rel_set_residual() calls. This function should be optimized.&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture) &lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2449</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2449"/>
		<updated>2011-05-25T12:16:46Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Installation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible for managing all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to accelerator_mgr.&lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples accelerator manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or more recent architecture) &lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
The following explains step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2448</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2448"/>
		<updated>2011-05-25T12:14:08Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Goals */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible for managing all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to accelerator_mgr.&lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples accelerator manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or newer architecture) &lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
In the following we will explain a step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2445</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2445"/>
		<updated>2011-05-25T10:03:39Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Installation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible to manage all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to GPU_manager. &lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples GPU manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or newer architecture) &lt;br /&gt;
In addition to the GPU hardware, the CUDA SDK and developer driver should be installed on the host machine and it is necessary to link the BinCUDA&#039;s Makefile to the SDK directory.&lt;br /&gt;
&lt;br /&gt;
=== Installing CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
In the following we will explain a step by step instructions for installing CUDA SDK on an Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2444</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2444"/>
		<updated>2011-05-25T09:54:37Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Installation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible to manage all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to GPU_manager. &lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples GPU manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or newer architecture). &lt;br /&gt;
&lt;br /&gt;
The CUDA SDK and developer driver are nesecceary for execution of BinCUDAs and also the BinCUDA&#039;s Makefile should be linked to the SDK directory as it has been explained.&lt;br /&gt;
&lt;br /&gt;
=== Install CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
In the following we will explain a step by step instructions for installing CUDA SDK with Samples in an Ubuntu (10.04) 32bit machine. It should be noted that the installation process in other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2443</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2443"/>
		<updated>2011-05-25T09:53:30Z</updated>

		<summary type="html">&lt;p&gt;Arash: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible to manage all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to GPU_manager. &lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples GPU manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine of ASCEND should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or newer architecture). &lt;br /&gt;
&lt;br /&gt;
The CUDA SDK and developer driver are nesecceary for execution of BinCUDAs and also the BinCUDA&#039;s Makefile should be linked to the SDK directory as it has been explained.&lt;br /&gt;
&lt;br /&gt;
=== Install CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
In the following we will explain a step by step instructions for installing CUDA SDK with Samples in an Ubuntu (10.04) 32bit machine. It should be noted that the installation process in other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example in {{srcbranchdir|arash|models/test/bintok/larg_distil.a4c}}, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column ({{srcbranchdir|arash|models/test/bintok/larg_distil_2.a4c}}), &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Running the test ==&lt;br /&gt;
&lt;br /&gt;
A CUnit test case was prepared to test BinCUDA generation and execution.&lt;br /&gt;
The code is located in test_bincuda.c({{srcbranchdir|arash|ascend/compiler/test/test_bincuda.c}}). &lt;br /&gt;
You can run the test by executing &amp;quot;test/test compiler_bincuda.gen&amp;quot; at top &lt;br /&gt;
level ASCEND directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The test case outputs the CPU-based evaluation time, GPU-based evaluation &lt;br /&gt;
time and the number of equations in the model.&lt;br /&gt;
It generates the code in the &amp;quot;/tmp&amp;quot; directory and the Makefile located in the same directory &lt;br /&gt;
is responsible for building the shared binary object for BinCUDAs. The CUDA &lt;br /&gt;
build and compile commands are provided in the Makefile({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}).&lt;br /&gt;
&lt;br /&gt;
To change the current benchmark model, you can change the macro FILENAMESTEM&lt;br /&gt;
in the code. [Please note that if your model includes any specific&lt;br /&gt;
ASCEND function (e.g. asc_ipow) the function should be defined in the &lt;br /&gt;
btcudapl.cu ({{srcbranchdir|arash|ascend/bintokens/bincuda/btcudapl.cu}}) file.]   &lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2442</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2442"/>
		<updated>2011-05-25T09:36:57Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Set CUDA SDK in Makefile */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible to manage all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to GPU_manager. &lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples GPU manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine of ASCEND should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or newer architecture). &lt;br /&gt;
&lt;br /&gt;
The CUDA SDK and developer driver are nesecceary for execution of BinCUDAs and also the BinCUDA&#039;s Makefile should be linked to the SDK directory as it has been explained.&lt;br /&gt;
&lt;br /&gt;
=== Install CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
In the following we will explain a step by step instructions for installing CUDA SDK with Samples in an Ubuntu (10.04) 32bit machine. It should be noted that the installation process in other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BinCUDA Makefile settings ===&lt;br /&gt;
&lt;br /&gt;
After installing CUDA SDK and samples, the CUDA_SAMPLES variable in the makefile ({{srcbranchdir|arash|ascend/bintokens/bincuda/Makefile}}) should be pointed to the SDK samples directory.&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== More information ==&lt;br /&gt;
&lt;br /&gt;
More information about the BinCUDAs is provided in the read-me file.&lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2441</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2441"/>
		<updated>2011-05-25T09:26:57Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Install CUDA SDK on Linux */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible to manage all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to GPU_manager. &lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples GPU manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine of ASCEND should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or newer architecture). &lt;br /&gt;
&lt;br /&gt;
The CUDA SDK and developer driver are nesecceary for execution of BinCUDAs and also the BinCUDA&#039;s Makefile should be linked to the SDK directory as it has been explained.&lt;br /&gt;
&lt;br /&gt;
=== Install CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
In the following we will explain a step by step instructions for installing CUDA SDK with Samples in an Ubuntu (10.04) 32bit machine. It should be noted that the installation process in other flavors of Linux is quiet similar, however, the equivalent file distribution from [http://developer.nvidia.com/cuda-downloads NVIDIA website] should be replaced with the current Ubuntu (10.04) 32bit file addresses.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with the new NVIDIA driver and then you should be able to install the CUDA 3.2 toolkit and samples (it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending this text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created in the &amp;quot;bin&amp;quot; directory. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Set CUDA SDK in Makefile ===&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== More information ==&lt;br /&gt;
&lt;br /&gt;
More information about the BinCUDAs is provided in the read-me file.&lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2440</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2440"/>
		<updated>2011-05-25T09:17:52Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Install CUDA SDK on Linux */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible to manage all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to GPU_manager. &lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples GPU manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine of ASCEND should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or newer architecture). &lt;br /&gt;
&lt;br /&gt;
The CUDA SDK and developer driver are nesecceary for execution of BinCUDAs and also the BinCUDA&#039;s Makefile should be linked to the SDK directory as it has been explained.&lt;br /&gt;
&lt;br /&gt;
=== Install CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
In the following we will explain a step by step installation method for CUDA SDK(+Samples) in an Ubuntu (10.04) 32bit machine. It should be noted that the installation process in other flavors of Linux is quiet similar, however, the equivalent file distribution from [[NVIDIA site]] should be replaced with current Ubuntu (10.04) 32bit files.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with NVIDIA developer driver and now you are be able to install the CUDA 3.2 toolkit and samples(it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod +x ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH by appending bellow text to ~/.bashrc file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
PATH=$PATH:/usr/local/cuda/bin&lt;br /&gt;
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib&lt;br /&gt;
export PATH&lt;br /&gt;
export LD_LIBRARY_PATH &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples. To test this issue:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples will be created. For example you should be able to run N-Body simulation sample:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=sh&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release&lt;br /&gt;
./nbody&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Set CUDA SDK in Makefile ===&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== More information ==&lt;br /&gt;
&lt;br /&gt;
More information about the BinCUDAs is provided in the read-me file.&lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2439</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2439"/>
		<updated>2011-05-25T09:01:32Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Installation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible to manage all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to GPU_manager. &lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples GPU manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, the host machine of ASCEND should be supplied with a NVIDIA CUDA enabled GPU card (preferably Fermi or newer architecture). &lt;br /&gt;
&lt;br /&gt;
The CUDA SDK and developer driver are nesecceary for execution of BinCUDAs and also the BinCUDA&#039;s Makefile should be linked to the SDK directory as it has been explained.&lt;br /&gt;
&lt;br /&gt;
=== Install CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
In the following we will explain a step by step installation method for CUDA SDK(+Samples) in an Ubuntu (10.04) 32bit machine. It should be noted that the installation process in other flavors of Linux is quiet similar, however, the equivalent file distribution from [[NVIDIA site]] should be replaced with current Ubuntu (10.04) 32bit files.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=Bash&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=Bash&amp;gt;&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with NVIDIA developer driver and now you are be able to install the CUDA 3.2 toolkit and samples(it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=Bash&amp;gt;&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod + ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH:&lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples. To test this issue:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=Bash&amp;gt;&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples are created in bin/linux/release. For example you should be able to run N-Body simulation example:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Set CUDA SDK in Makefile ===&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== More information ==&lt;br /&gt;
&lt;br /&gt;
More information about the BinCUDAs is provided in the read-me file.&lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2438</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2438"/>
		<updated>2011-05-25T08:49:39Z</updated>

		<summary type="html">&lt;p&gt;Arash: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
Development branch: {{srcbranchdir|arash|}}&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (accelerator_mgr) which is responsible to manage all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to GPU_manager. &lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples GPU manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
== Installation ==&lt;br /&gt;
&lt;br /&gt;
To run BinCUDA objects, host machine of ASCEND should be supplied with a NVIDIA CUDA enabled GPU card, preferably Fermi (or newer) architecture. &lt;br /&gt;
&lt;br /&gt;
The BiCUDAs need the CUDA SDK and developer driver also their Makefile should be linked to the SDK directory:&lt;br /&gt;
&lt;br /&gt;
=== Install CUDA SDK on Linux ===&lt;br /&gt;
&lt;br /&gt;
In the following we will explain a step by step installation method for CUDA Driver/SDK/Samples on a Ubuntu (10.04) 32bit machine. It should be noted that the installation process on other Linux flavors is quiet similar, however, the appropriate file distribution from [[NVIDIA site]] should be replaced with current Ubuntu (10.04) 32bit files.&lt;br /&gt;
&lt;br /&gt;
1) In the terminal window issue &lt;br /&gt;
&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
chmod +x ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
&lt;br /&gt;
2) Stop the X Windows by pressing CTRL+ALT+F1 and then issue&lt;br /&gt;
&lt;br /&gt;
sudo /etc/init.d/gdm stop&lt;br /&gt;
sudo ./devdriver_3.2_linux_32_260.19.26.run&lt;br /&gt;
sudo /etc/init.d/gdm start&lt;br /&gt;
&lt;br /&gt;
3) The X Windows should be restarted with NVIDIA developer driver and now you are be able to install the CUDA 3.2 toolkit and samples(it is recommended to use default directories - i.e. /usr/local/cuda and ~/NVIDIA_GPU_Computing_SDK).&lt;br /&gt;
&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
chmod +x ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
sudo ./cudatoolkit_3.2.16_linux_32_ubuntu10.04.run&lt;br /&gt;
wget http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
chmod + ./gpucomputingsdk_3.2.16_linux.run&lt;br /&gt;
./gpucomputingsdk_3.2.16_linux.run &lt;br /&gt;
&lt;br /&gt;
4) Add /usr/local/cuda/bin to PATH and /user/local/cuda/lib to LD_LIBRARY_PATH:&lt;br /&gt;
&lt;br /&gt;
5) Now you should be able to compile the SDK samples. To test this issue:&lt;br /&gt;
&lt;br /&gt;
cd ~/NVIDIA_GPU_Computing_SDK/C&lt;br /&gt;
make&lt;br /&gt;
&lt;br /&gt;
6) The executable binaries for the samples are created in bin/linux/release. For example you should be able to run N-Body simulation example:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Set CUDA SDK in Makefile ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== More information ==&lt;br /&gt;
&lt;br /&gt;
More information about the BinCUDAs is provided in the read-me file.&lt;br /&gt;
&lt;br /&gt;
[[Category:GSOC2011]]&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2111</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2111"/>
		<updated>2011-04-07T22:51:17Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Test models */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (GPU_manager) which is responsible to manage all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to GPU_manager. &lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples GPU manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
A distillation column model was created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
The model originally has 128 unique equation symbolic forms and 19959 equation instances.The number of relations in the model can be adjusted by changing two parameters, 100 and 51, by a multiplicative factor. For example, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, multiple columns can be used instead of single column,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2110</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2110"/>
		<updated>2011-04-07T22:37:55Z</updated>

		<summary type="html">&lt;p&gt;Arash: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (GPU_manager) which is responsible to manage all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to GPU_manager. &lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples GPU manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
The following test model is created to test the GPU-based bintokens, the model was proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Large Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Number of Equations ===&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL larg_distil() REFINES test_demo_column();&lt;br /&gt;
        demo IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,500,255);&lt;br /&gt;
METHODS&lt;br /&gt;
END larg_distil;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;REQUIRE &amp;quot;column.a4l&amp;quot;;&lt;br /&gt;
MODEL c5_10_demo_column() REFINES test_demo_column();&lt;br /&gt;
        demo,demo2,demo3,demo4 IS_A&lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS&lt;br /&gt;
END c5_10_demo_column;&amp;lt;/source&amp;gt;&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2102</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2102"/>
		<updated>2011-04-07T15:20:02Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Test models */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (GPU_manager) which is responsible to manage all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to GPU_manager. &lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples GPU manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
The following test model is created to test the GPU-based bintokens, the models are proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Large Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;MODEL c4_10_demo_column() REFINES test_demo_column(); &lt;br /&gt;
        demo IS_A &lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS &lt;br /&gt;
END c4_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Adjusting Number of Equations ===&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2101</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2101"/>
		<updated>2011-04-07T15:18:50Z</updated>

		<summary type="html">&lt;p&gt;Arash: /* Large Distillation Column Model */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (GPU_manager) which is responsible to manage all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to GPU_manager. &lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples GPU manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
The following test models are created to test the GPU-based bintokens, the models are proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Large Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;a4c&amp;quot;&amp;gt;MODEL c4_10_demo_column() REFINES test_demo_column(); &lt;br /&gt;
        demo IS_A &lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS &lt;br /&gt;
END c4_10_demo_column;&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Adjusting Number of Equations ===&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
	<entry>
		<id>https://ascend4.org/index.php?title=User:Arash&amp;diff=2100</id>
		<title>User:Arash</title>
		<link rel="alternate" type="text/html" href="https://ascend4.org/index.php?title=User:Arash&amp;diff=2100"/>
		<updated>2011-04-07T15:16:05Z</updated>

		<summary type="html">&lt;p&gt;Arash: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Arash Sadrieh&#039;&#039;&#039; is working on developing GPU-based solvers for ASCEND. He is a PhD student at Murdoch University in Western Australia.&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
&lt;br /&gt;
* Make ASCEND to export models(residuals and jacobian) evaluators to bintokens.&lt;br /&gt;
** reinstate bintoken functionality&lt;br /&gt;
** add gradient calculation support to bintoken stuff&lt;br /&gt;
* Prepare a large model (preferably 100,000+) and a unit test for verifying and benchmarking the NLA solver using this model.&lt;br /&gt;
* Develop a CUDA code generator that creates GPU-based bintokens.&lt;br /&gt;
* Create a new library in ascend (GPU_manager) which is responsible to manage all the GPU related tasks. Including data transfer between host and GPU, launching bintoken CUDA kernels and parallel calculation of the residuals normal (required in line-search algorithm).&lt;br /&gt;
* Fork a new NLA solver from current solver: In the new solver when the solver needs to evaluate a block residual or Jacobian, the call is redirected to GPU_manager. &lt;br /&gt;
* Wrapping appropriate functionality in ascend solver interface that decouples GPU manager from the solver. (The interface should provide batch residual (and Jacobian) evaluation for group of relations).&lt;br /&gt;
* Benchmark the results and probably switch to other many (or multi) core architectures and languages.&lt;br /&gt;
&lt;br /&gt;
== Progress ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;fill in here&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Test models ==&lt;br /&gt;
&lt;br /&gt;
The following test models are created to test the GPU-based bintokens, the models are proposed by Ben Allan.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Large Distillation Column Model ===&lt;br /&gt;
&lt;br /&gt;
MODEL c4_10_demo_column() REFINES test_demo_column(); &lt;br /&gt;
        demo IS_A &lt;br /&gt;
        demo_column([&#039;n_butane&#039;,&#039;n_pentane&#039;,&#039;n_hexane&#039;,&#039;n_heptane&#039;,&#039;n_octane&#039;,&#039;n_nonane&#039;,&#039;n_decane&#039;],&#039;n_decane&#039;,100,51);&lt;br /&gt;
METHODS &lt;br /&gt;
END c4_10_demo_column;&lt;br /&gt;
&lt;br /&gt;
=== Adjusting Number of Equations ===&lt;/div&gt;</summary>
		<author><name>Arash</name></author>
	</entry>
</feed>