CN105335215B - A cloud computing-based Monte Carlo simulation acceleration method and system - Google Patents

A cloud computing-based Monte Carlo simulation acceleration method and system Download PDF

Info

Publication number
CN105335215B
CN105335215B CN201510885304.5A CN201510885304A CN105335215B CN 105335215 B CN105335215 B CN 105335215B CN 201510885304 A CN201510885304 A CN 201510885304A CN 105335215 B CN105335215 B CN 105335215B
Authority
CN
China
Prior art keywords
hadoop
monte carlo
program
simulation
cloud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510885304.5A
Other languages
Chinese (zh)
Other versions
CN105335215A (en
Inventor
刘仰川
高欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Institute of Biomedical Engineering and Technology of CAS
Original Assignee
Suzhou Institute of Biomedical Engineering and Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Institute of Biomedical Engineering and Technology of CAS filed Critical Suzhou Institute of Biomedical Engineering and Technology of CAS
Priority to CN201510885304.5A priority Critical patent/CN105335215B/en
Publication of CN105335215A publication Critical patent/CN105335215A/en
Application granted granted Critical
Publication of CN105335215B publication Critical patent/CN105335215B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention relates to a kind of Monte Carlo simulation accelerated method and system based on cloud computing, comprising: install Hadoop and Monte Carlo software on the local computer, and configure Hadoop and operate in pseudo- distribution pattern;MapReduce program is write on the local computer, and makes emulation input text;Fabrication and installation have the machine images of Hadoop and Monte Carlo software beyond the clouds, instantiate a certain number of virtual servers using the machine images of production, and configure the Hadoop on all virtual servers in cloud and operate in super distributed mode, form Hadoop cluster;Local MapReduce program and emulation input text are uploaded to virtual server, and run MapReduce on Hadoop cluster beyond the clouds, carries out the distributed computing of Monte Carlo simulation.The present invention can flexible choice virtual server quantity and configuration, each accessible network place can be used.

Description

A kind of Monte Carlo simulation accelerated method and system based on cloud computing
Technical field
The invention belongs to field of cloud computer technology more particularly to a kind of Monte Carlo simulation accelerated methods based on cloud computing And system.
Background technique
Monte Carlo (Monte Carlo, MC) method is also known as random sampling skill or statistical test method, with prevailing value Calculation method has very big difference, it is a kind of method based on Probability Statistics Theory.Monte carlo method, which can compare, forces The characteristics of very describing things and Physical Experiment process solve the problems, such as that some numerical methods are insoluble, thus this method has Be widely applied field.
The advantages of monte carlo method includes: that can more realistically describe the things with random nature and physics Experimentation;It is limited by geometrical condition small;Convergence rate is unrelated with the dimension of problem;With calculating multiple schemes and multiple simultaneously The ability of unknown quantity;Error is easy to determine;Program structure is simple, it is easy to accomplish.These advantages make its application range increasingly Extensively.The main application of this method include: PARTICLE TRANSPORT FROM problem, Statistical Physics, typical mathematical problem, vacuum technique, laser technology, Medicine, biology, mine locating etc..Wherein in PARTICLE TRANSPORT FROM problem application mainly cover In Experimental Nuclear Physics, reactor physics, High-energy physics etc..
The shortcomings that monte carlo method, is also than more prominent, including convergence rate is slow;Error has probability;In PARTICLE TRANSPORT FROM In problem, calculated result is related with system size.When solving the problems, such as more complex (or model) using Monte Carlo simulation, convergence Slow-footed problem seems especially prominent.In addition, in order to improve simulation accuracy (i.e. with really solve approximation ratio), Monte Carlo Method requires the quantity of random number to reach million or even ten million rank.Slower convergence rate and huge random number quantity make The calculation amount for obtaining Monte Carlo simulation surges, time-consuming serious, which has limited this method in the high field of requirement of real-time (as radiated Treatment planning) in application.
In monte carlo method, since the random process of each random number (or particle) in a model is independent mutually, can To carry out parallel computation.General each Monte Carlo program all has parallel computation characteristic, i.e., decomposes calculating task, and obtaining can be with The subtask individually calculated gives and obtains son after computing unit is calculated as a result, carrying out the merging of result again.Monte Carlo is imitative True accelerated method is carried out according to this characteristic.
Existing Monte Carlo simulation accelerated method includes:
(1) the Monte Carlo simulation accelerated method based on CPU cluster
In high-performance computing sector, CPU cluster calculating method is carried out more early.The supercomputer of domestic contrast maturation is most Using hundreds of CPU array.MPI is most important multiple programming realization means at present, it defines one group with removable The programming interface of plant property, therefore programmer only needs to design parallel algorithm, calls the correlation function in MPI library that can realize Operation on multiple computing units.Such as the Lu Yune of Institutes Of Technology Of Changsha, " microcomputer cluster system is based in the paper that it is delivered MPI parallel computation " in, author realizes the acceleration that PI value is sought using Monte Carlo integration method by MPI parallel Programming It calculates.Experiment shows: parallel Programming solve using multicomputer cluster the calculating speed for relatively using single computer It is fast.The for another example Fu You of University Of Science and Technology Of Shandong, in the paper that it is delivered, " low density gas Straight simulation monte carlo method is interactive Parallelizingsystem research and realization " in, 8 node group systems are used, low density gas Straight simulation Monte Carlo side is realized Method interactive parallelization calculates.Author will be converted into the parallel source under MPI environment based on 77 serial D SMC source program of Fortran Program achieves preferable acceleration effect.
(2) it is based on the Monte Carlo simulation accelerated method of GPU (cluster)
GPU full name in English is Graphic Processing Unit, and translator of Chinese is " graphics processor ".GPU is equivalent to " brain " of video card is special image core processor.The maximum characteristic of GPU is exactly to possess powerful operational capability, even It is eager to excel than the operational capability of CPU many times larger.GPU free time is effectively utilized, the potential for playing GPU becomes industry concern Focus, that is, so-called " GPU hardware acceleration ".High speed, concurrency and the programmable functions of GPU are other than image procossing General-purpose computations provide good operation platform, make GPU algebraic manipulation, Solving Partial Differential Equations and in terms of There is preferable application.CUDA (Compute Unified Device Architecture) is that video card manufacturer NVIDIA is released A kind of universal parallel computing architecture, at present using wide.CUDA is contained inside instruction set architecture (ISA) and GPU C language can be used to write program in parallel computation engine, developer, can be on the GPU for supporting CUDA with very-high performance Operation.Such as the national great waves doctor of the Central China University of Science and Technology, in its doctor's thesis " steady-state fluorescence molecular tomography restructing algorithm Theoretical and experimental study " in, propose it is a kind of using GPU cluster accelerate the steady-state fluorescence molecule based on monte carlo method at As (Fluorescence Molecular Tomography, FMT) restructing algorithm.This method utilizes 3 outfits in local area network The computer of NVIDIA video card sets up GPU cluster by Message Passing Interface (MPI) technology, will be total Calculating task is evenly distributed in 3 calculate nodes, realizes multiple GPU parallel computations.Wherein 3 computers are equipped with NVIDIA video card is G200, and GPU program is write based on CUDA.Author is accelerated using GPU cluster, is solved the side MC Method achieves preferable acceleration effect for huge time loss problem present in FMT reconstruct.The for another example Central China University of Science and Technology He Yongxiang is realized in the paper " the efficient GPU parallel computation of aerodynamics direct simulation Monte Carlo " that it is delivered The parallel computation of the direct Monte Carlo emulation of aerodynamics based on CUDA.Author uses NVIDA Tesla C2075 GPU has carried out the Acceleration study of single GPU, double GPU and more GPU, and opposite CPU is calculated, and achieves good acceleration effect, and guarantee Computational accuracy.
Existing Monte Carlo simulation accelerated method suffers a disadvantage in that
(1) when accelerating program parallel using language developments GPU such as CUDA, developer needs to have parallel thinking, exploitation Program be difficult to debug and to continue to optimize and could obtain preferable acceleration effect.
(2) it purchases supercomputer and realizes CPU cluster, it is with high costs;And the Small-sized C PU collection built using multiple computers Group, computing capability are limited.Carry out parallel program development using MPI technology, need voluntarily to carry out memory, thread management, difficulty compared with Greatly.In addition, local computer cluster needs constantly maintenance, increase personnel cost.
Summary of the invention
The present invention provides a kind of Monte Carlo simulation accelerated method and system based on cloud computing, it is intended at least certain One of above-mentioned technical problem in the prior art is solved in degree.
Implementation of the present invention is as follows, a kind of Monte Carlo simulation accelerated method based on cloud computing, comprising the following steps:
Step a: Hadoop and Monte Carlo software are installed on the local computer, and configures Hadoop and operates in pseudo- distribution Mode;
Step b: writing MapReduce program on the local computer, and makes emulation input text;
Step c: fabrication and installation have the machine images of Hadoop and Monte Carlo software beyond the clouds, utilize the machine mirror of production As instantiating a certain number of virtual servers, and the Hadoop configured on all virtual servers in cloud operates in super distributed mould Formula forms Hadoop cluster;
Step d: local MapReduce program and emulation input text are uploaded to virtual server, and beyond the clouds MapReduce is run on Hadoop cluster, carries out the distributed computing of Monte Carlo simulation.
The technical solution that the embodiment of the present invention is taken further include: described to write MapReduce program in the step b It specifically includes:
Step b1: writing map program, and program successively includes inputting the simulation calculation task that reads from standard, calling Meng Teka Lip river program carries out simulation calculation, standard output is written in calculated result in the form of key-value pair;
Step b2: writing reduce program, and program successively includes reading the simulation calculation with same keys from standard input As a result, calculated result is merged, standard output is written into combined result in the form of key-value pair;
Step b3: writing Hadoop Streaming operation procedure, and program includes the input and output of map and reduce program Stream format, Map and Reduce number of tasks, input text title, outgoing route, mapper and reducer filename and upload text Part path.
The technical solution that the embodiment of the present invention is taken further include: in the step b, the production emulation input text tool Body includes:
Step b4: if the input of Monte Carlo software is random number, random number needed for just generating actual emulation;If defeated Entering is program file, program file needed for just generating actual emulation;
Step b5: according to the scale of parallel computation to be carried out, being grouped random number or decompose to program file, Every group of random number or the corresponding parallel artificial of each subprogram file;
Step b6: the path of random number or program file is entered into a text by row write, as input file.
The technical solution that the embodiment of the present invention is taken further include: the step b further include: run on the local computer Hadoop Streaming, debugging MapReduce program and verifying emulation input text.
The technical solution that the embodiment of the present invention is taken further include: in the step c, all virtual clothes in configuration cloud Hadoop on business device operates in super distributed mode and specifically includes: select respectively one virtual server as Master with Secondary NameNode, remaining virtual server is as Worker;In any virtual server of local computer or cloud On, using SSH communications protocol, successively according to virtual server Type Change Hadoop configuration file, and transmit them to phase On the virtual server answered, the configuration file of original position is replaced;Hadoop initialization behaviour is carried out on Master virtual server Make, Hadoop is made to operate in complete distribution pattern, forms Hadoop cluster.
The technical solution that the embodiment of the present invention is taken further include: in the step d, the operation cloud MapReduce Carry out the distributed computing of Monte Carlo simulation specifically: operation Hadoop Streaming operation procedure, MapReduce are automatic Map program and reduce program are operated on different Worker virtual servers, forms Map task and Reduce task; In Map task, map program realizes reading, simulation calculation, the intermediate result output of Monte Carlo simulation task;In Reduce In task, the reading of result, the merging of result and output between the realization of reduce program;The monitoring page provided by Hadoop Cluster operating status is monitored.
The technical solution that the embodiment of the present invention is taken further include: the step d further include: after simulation calculation, by cloud End simulation result is downloaded to local computer.
A kind of another technical solution that the embodiment of the present invention is taken are as follows: Monte Carlo simulation acceleration system based on cloud computing System, including pattern configurations module, function write module, text writing module, cluster configuration module, data transmission module and emulation Computing module;
The pattern configurations module configures Hadoop for installing Hadoop and Monte Carlo software on the local computer It operates under pseudo- distribution pattern;
The function writes module for writing the MapReduce journey for Monte Carlo simulation on the local computer Sequence;
The text writing module is for production emulation input text on the local computer;
The mirror image production module has the machine images of Hadoop and Monte Carlo software for fabrication and installation beyond the clouds, and A certain number of virtual servers are instantiated using the machine images of production;
The Hadoop that the cluster configuration module is used to configure on all virtual servers in cloud operates in super distributed mode, Form Hadoop cluster;
The data transmission module is used to the MapReduce program of local computer and emulation input text being uploaded to void Quasi- server;
The simulation algorithm model runs MapReduce on Hadoop cluster beyond the clouds, carries out Monte Carlo simulation Distributed computing.
The technical solution that the embodiment of the present invention is taken further include: it further include function debugging module, the function debugging module For running Hadoop Streaming, debugging MapReduce program and verifying emulation input text on the local computer.
The technical solution that the embodiment of the present invention is taken further include: it further include data download module, the data download module For after simulation calculation, cloud simulation result to be downloaded to local computer.
What the Monte Carlo simulation accelerated method based on cloud computing and system of the embodiment of the present invention were provided using cloud platform Virtual server builds Hadoop cluster, and MapReduce frame is relied on to realize distributed computing, and user only needs customized The processes such as the calling to Monte Carlo program, intermediate result processing are realized in map, reduce function, in Hadoop cluster Operation can be realized the emulation of the Monte Carlo based on distributed computing framework MapReduce and accelerate.The present invention is real beyond the clouds Existing calculating, can be with the quantity and configuration of flexible choice virtual server, and the mode charged on time guarantees that cost is controllable, and every The place of a accessible network can be used.
Detailed description of the invention
Fig. 1 is the flow chart of the Monte Carlo simulation accelerated method based on cloud computing of the embodiment of the present invention;
Fig. 2 is the flow chart of the method for writing MapReduce program on the local computer of the embodiment of the present invention;
Fig. 3 is the flow chart of the method for the emulation input text of production on the local computer of the embodiment of the present invention;
Fig. 4 is cloud computing service form schematic diagram;
Fig. 5 is the structural schematic diagram of the Monte Carlo simulation acceleration system based on cloud computing of the embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
Referring to Fig. 1, being the flow chart of the Monte Carlo simulation accelerated method based on cloud computing of the embodiment of the present invention.This The Monte Carlo simulation accelerated methods based on cloud computing of inventive embodiments the following steps are included:
Step 100: Hadoop and Monte Carlo software being installed on the local computer, configuration Hadoop operates in pseudo- distribution Under mode;
In step 100, Hadoop is the distributed computing architecture developed by Apache foundation, is a kind of open source Software.User can develop distributed program using Hadoop, make full use of without understanding the details of the distributed bottom layer The power of cluster carries out high speed computing and storage.The core of Hadoop includes HDFS (Hadoop Distributed File System, Hadoop distributed file system) and MapReduce frame, HDFS provide storage, MapReduce for mass data Frame provides calculating for mass data.
Hadoop includes three kinds of operational modes:
(1) single cpu mode (Standalone Mode)
Single cpu mode is the default mode of Hadoop, and Hadoop can operate in local completely.Because not needing to save with other Point interaction, single cpu mode just do not use HDFS, are not loaded with the finger daemon of any Hadoop yet.The mode is mainly used for exploitation and adjusts Try the application logic of MapReduce frame.
(2) pseudo- distribution pattern (Pseudo-Distributed Mode)
Pseudo- distribution pattern is that Hadoop is operated on " single node cluster ", wherein all finger daemons all operate in it is same On platform machine, HDFS is also to establish in local file system.The mode increases code debugging function on single cpu mode, You is allowed to check memory service condition, HDFS input and output and other finger daemon interactions.It has debugged in this mode Program can operate in without modification under super distributed mode.
(3) super distributed mode (Fully Distributed Mode)
Hadoop finger daemon operates on a cluster, realizes real distributed computing and storage.Wherein guard into On different machines, HDFS is also to establish on different machines to Cheng Yunhang.There are three types of role, master for machine in cluster Machine is responsible for the scheduling of task, and Secondary NameNode machine is responsible for the backup to critical data on master machine, and Worker machine undertakes data processing task.
Step 200: writing the MapReduce program for Monte Carlo simulation on the local computer;
It is writing on the local computer for the embodiment of the present invention referring to Figure 2 together to clearly illustrate step 200 The flow chart of the method for MapReduce program.The side for writing MapReduce program on the local computer of the embodiment of the present invention Method the following steps are included:
Step 201: writing map program, program successively includes reading simulation calculation task from standard input (stdin), adjusting Simulation calculation is carried out with Monte Carlo program, marks calculated result with the form write-in of key-value pair (Key-Value-Pair, KVP) Quasi- output (stdout);
Step 202: writing reduce program, program successively includes that the emulation meter with same keys (Key) is read from stdin Result is calculated, calculated result is merged, stdout is written into combined result in the form of KVP;
Step 203: writing Hadoop Streaming operation procedure, program includes that the input of map and reduce program is defeated Out flow (stream) format, Map and Reduce task (task) number, input text title, outgoing route, mapper and Reducer filename, upload file path etc..
Step 300: production emulation input text on the local computer;
It is the production on the local computer of the embodiment of the present invention also referring to Fig. 3 to clearly illustrate step 300 The flow chart of the method for emulation input text.The method of the emulation input text of production on the local computer of the embodiment of the present invention The following steps are included:
Step 301: if the input of Monte Carlo software is random number, random number needed for just generating actual emulation;If Input is program file, program file needed for just generating actual emulation;
Step 302: according to the scale (Thread Count) of parallel computation to be carried out, random number being grouped or to program file It is decomposed, every group of random number or the corresponding parallel artificial of each subprogram file;
Step 303: the path of random number or program file being entered into (text) text by row write, as input file.
Step 400: running Hadoop Streaming on the local computer, debugging MapReduce program and verifying are imitative True input text;
In step 400, when realizing distributed computing using Hadoop, need to develop MapReduce application program.It is most simple Single MapReudce application program contains at least one map function, a reduce function and a main function.Map and The general format that reduce function follows is:
map:(k1,v1)→list(k2,v2)
reduce:(k2,list(v2))→list(k3,v3)
Wherein, each member that map function receives one group of data and is converted into a key/value to list, in input domain The corresponding key/value pair of element;Reduce function receives the list of map function generation, then (raw for each key according to their key At a key/value to) reduce key/value to list.
During the entire process of writing map and reduce function, input data is to come from bottom distributed file system HDFS, Intermediate data is placed on local file system, and final output data are write-in bottom distributed file system HDFS.
Step 500: fabrication and installation have the machine images of Hadoop and Monte Carlo software beyond the clouds, according to calculate needs, Selected hardware configuration instantiates a certain number of virtual servers using the machine images of production;
In step 500, cloud computing (Cloud Computing) is grid computing (Grid Computing), distribution Calculate (Distributed Computing), parallel computation (Parallel Computing), effectiveness calculating (Utility Computing), network storage (Network Storage Technologies), virtualization (Virtualization), load The product of traditional computers technologies such as balanced (Load Balance) and network technical development fusion.It is intended to through network more The computational entity of a advantage of lower cost is integrated into the perfect system with powerful calculating ability, services by software (SaaS), platform services (PaaS) and basis instrument and services advanced business models such as (IaaS) powerful computing capability It is distributed in terminal user's hand.Currently, the main services form of cloud computing include: SaaS (Software as aService, it is soft Part services), PaaS (Platform as a Service, platform i.e. service) and IaaS (Infrastructure as a Service, infrastructure services), specifically as shown in figure 4, being cloud computing service form schematic diagram.The present invention is in cloud platform IaaS builds Hadoop cluster, and the distributed computing of Monte Carlo simulation is realized using the MapReduce frame in Hadoop. IaaS is supplied to client as metering service i.e. " cloud " infrastructure being made of multiple servers.It is by memory, I/O Equipment, storage and computing capability are integrated into storage resource required for a virtual resource pool provides for entire industry and virtual Change the service such as server.The cloud service provider of IaaS type can provide the host of various configurations, and user is it is required to determine that machine Device mirror image and host hardware configuration.
Machine images provided by cloud service provider are not able to satisfy exploitation demand generally, need further configuration surroundings. The machine images that the embodiment of the present invention makes beyond the clouds need to install Monte Carlo and Hadoop software, and with local computing equipment There are identical configuration surroundings, the program debugged on the local computer in this way can directly be run beyond the clouds.Utilize production Machine images can instantiate any number of virtual server (also known as " node " or " example instance "), avoid to big Measure the cumbersome of the virtual server configuration surroundings one by one of instantiation.
Step 600: the Hadoop on the configuration all virtual servers in cloud operates in super distributed mode, forms Hadoop collection Group;
In step 600, the Hadoop on the configuration all virtual servers in cloud operates in super distributed mode and specifically includes: Select a node as Master and Secondary NameNode respectively, remaining node is as Worker;In local computing On machine or cloud any node, using SSH communications protocol, successively change Hadoop configuration file according to node type, and by it Be transmitted on corresponding node, replace the configuration file of original position;Hadoop initialization operation is carried out on Master node, So that Hadoop is operated in complete distribution pattern, forms Hadoop cluster.
Step 700: the MapReduce program of local computer and emulation input text are uploaded to Master Virtual Service Device;
Step 800: running MapReduce on Hadoop cluster beyond the clouds, carry out the distributed computing of Monte Carlo simulation;
In step 800, the operation cloud MapReduce carries out the distributed computing of Monte Carlo simulation specifically: Hadoop Streaming operation procedure is run, MapReduce automatically operates in map program and reduce program different On Worker node, Map task and Reduce task are formed;In Map task, map program realizes Monte Carlo simulation task Reading, simulation calculation, intermediate result output;In Reduce task, the reading of result, result between the realization of reduce program Merging and output;Cluster operating status is monitored by the monitoring page that Hadoop is provided.
Step 900: after simulation calculation, cloud simulation result being downloaded to local computer.
Referring to Fig. 5, being the structural representation of the Monte Carlo simulation acceleration system based on cloud computing of the embodiment of the present invention Figure.The Monte Carlo simulation acceleration system based on cloud computing of the embodiment of the present invention includes that pattern configurations module, function write mould Block, text writing module, function debugging module, mirror image make module, cluster configuration module, data transmission module, simulation calculation Module and data download module;
Pattern configurations module for installing Hadoop and Monte Carlo software, configuration Hadoop operation on the local computer Under pseudo- distribution pattern;
Function writes module for writing the MapReduce program for Monte Carlo simulation on the local computer;Its In, it includes: to write map program that function, which writes the method that module writes MapReduce program, and program successively includes inputting from standard (stdin) read simulation calculation task, call Monte Carlo program carry out simulation calculation, by calculated result with key-value pair (Key- Value-Pair, KVP) form be written standard output (stdout);Reduce program is write, program successively includes from stdin Read the simulation result with same keys (Key), calculated result merged, by combined result in the form of KVP Stdout is written;Hadoop Streaming operation procedure is write, program includes the iostream of map and reduce program (stream) format, Map and Reduce task (task) number, input text title, outgoing route, mapper and reducer text Part name, upload file path etc..
Text writing module is for production emulation input text on the local computer;Wherein, text writing module making If the input that the method for emulation input text includes: Monte Carlo software is random number, just generate random needed for actual emulation Number;If input is program file, program file needed for just generating actual emulation;According to the scale (line of parallel computation to be carried out Number of passes), random number is grouped or program file is decomposed, every group of random number or each subprogram file are one corresponding Parallel artificial;The path of random number or program file is entered into (text) text by row write, as input file.
Function debugging module debugs MapReduce program for running Hadoop Streaming on the local computer Input text is emulated with verifying;Wherein, when realizing distributed computing using Hadoop, need to develop MapReduce application program. Simplest MapReudce application program contains at least one map function, a reduce function and a main function.map The general format followed with reduce function is:
map:(k1,v1)→list(k2,v2)
reduce:(k2,list(v2))→list(k3,v3)
Wherein, each member that map function receives one group of data and is converted into a key/value to list, in input domain The corresponding key/value pair of element;Reduce function receives the list of map function generation, then (raw for each key according to their key At a key/value to) reduce key/value to list.
During the entire process of writing map and reduce function, input data is to come from bottom distributed file system HDFS, Intermediate data is placed on local file system, and final output data are write-in bottom distributed file system HDFS.
Mirror image production module has the machine images of Hadoop and Monte Carlo software for fabrication and installation beyond the clouds, according to meter It calculates and needs, selected hardware configuration instantiates a certain number of virtual servers using the machine images of production;
The Hadoop that cluster configuration module is used to configure on all virtual servers in cloud operates in super distributed mode, is formed Hadoop cluster;Wherein, the Hadoop on the configuration all virtual servers in cloud operates in super distributed mode and specifically includes: respectively Select a node as Master and Secondary NameNode, remaining node is as Worker;In local computer or In any node of cloud, using SSH communications protocol, Hadoop configuration file successively is changed according to node type, and they are passed It transports on corresponding node, replaces the configuration file of original position;Hadoop initialization operation is carried out on Master node, is made Hadoop operates in complete distribution pattern, forms Hadoop cluster.
Data transmission module is used to the MapReduce program of local computer and emulation input text being uploaded to Master Virtual server;
Simulation algorithm model runs MapReduce on Hadoop cluster beyond the clouds, carries out point of Monte Carlo simulation Cloth calculates;Wherein, the operation cloud MapReduce carries out the distributed computing of Monte Carlo simulation specifically: operation Map program and reduce program are operated in different Worker automatically by Hadoop Streaming operation procedure, MapReduce On node, Map task and Reduce task are formed;In Map task, the reading of map program realization Monte Carlo simulation task, Simulation calculation, intermediate result output;In Reduce task, the reading of result, the merging of result between the realization of reduce program And output;Cluster operating status is monitored by the monitoring page that Hadoop is provided.
Data download module is used for after simulation calculation, and cloud simulation result is downloaded to local computer.
What the Monte Carlo simulation accelerated method based on cloud computing and system of the embodiment of the present invention were provided using cloud platform Virtual server builds Hadoop cluster, and MapReduce frame is relied on to realize distributed computing, and user only needs customized The processes such as the calling to Monte Carlo program, intermediate result processing are realized in map, reduce function, in Hadoop cluster Operation can be realized the Monte Carlo simulation based on distributed computing framework MapReduce and accelerate.The present invention is to realize beyond the clouds Calculating, can be with the quantity and configuration of flexible choice virtual server, the mode charged on time guarantees that cost is controllable, and each The place that can access network can be used.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims (10)

1.一种基于云计算的蒙特卡洛仿真加速方法,包括以下步骤:1. A cloud computing-based Monte Carlo simulation acceleration method, comprising the following steps: 步骤a:在本地计算机上安装Hadoop和蒙特卡洛软件,并配置Hadoop运行在伪分布模式;Step a: Install Hadoop and Monte Carlo software on the local computer, and configure Hadoop to run in pseudo-distributed mode; 步骤b:在本地计算机上编写用于调用蒙特卡洛软件的MapReduce程序,并制作用于蒙特卡洛软件输入的仿真输入文本;Step b: Write a MapReduce program for calling Monte Carlo software on the local computer, and make simulation input text for Monte Carlo software input; 步骤c:在云端制作安装有Hadoop和蒙特卡洛软件的机器镜像,利用制作的机器镜像实例化一定数量的虚拟服务器,并配置云端所有虚拟服务器上的Hadoop运行在全分布模式,形成Hadoop集群;Step c: Make a machine image installed with Hadoop and Monte Carlo software in the cloud, instantiate a certain number of virtual servers using the machine image produced, and configure Hadoop on all virtual servers in the cloud to run in a fully distributed mode to form a Hadoop cluster; 步骤d:将本地MapReduce程序和仿真输入文本上传至虚拟服务器,并在云端Hadoop集群上运行MapReduce,进行蒙特卡洛仿真的分布式计算。Step d: Upload the local MapReduce program and simulation input text to the virtual server, and run MapReduce on the cloud Hadoop cluster to perform distributed computing of Monte Carlo simulation. 2.根据权利要求1所述的基于云计算的蒙特卡洛仿真加速方法,其特征在于,在所述步骤b中,所述编写MapReduce程序具体包括:2. The cloud computing-based Monte Carlo simulation acceleration method according to claim 1, wherein in the step b, the writing MapReduce program specifically comprises: 步骤b1:编写map程序,程序依次包括从标准输入读取仿真计算任务、调用蒙特卡洛软件进行仿真计算、将计算结果以键值对的形式写入标准输出;Step b1: write a map program, the program sequentially includes reading the simulation calculation task from the standard input, calling the Monte Carlo software to perform the simulation calculation, and writing the calculation result in the form of a key-value pair to the standard output; 步骤b2:编写reduce程序,程序依次包括从标准输入读取具有相同键的仿真计算结果、将计算结果进行合并、将合并的结果以键值对的形式写入标准输出;Step b2: Write a reduce program, the program sequentially includes reading the simulation calculation results with the same key from the standard input, merging the calculation results, and writing the combined result in the form of a key-value pair to the standard output; 步骤b3:编写Hadoop Streaming作业程序,程序包括map和reduce程序的输入输出流格式、Map和Reduce任务数、输入文本名称、输出路径、mapper和reducer文件名及上传文件路径。Step b3: Write a Hadoop Streaming job program, which includes the input and output stream formats of the map and reduce programs, the number of Map and Reduce tasks, the input text name, the output path, the mapper and reducer file names, and the upload file path. 3.根据权利要求2所述的基于云计算的蒙特卡洛仿真加速方法,其特征在于,在所述步骤b中,所述制作仿真输入文本具体包括:3. The Monte Carlo simulation acceleration method based on cloud computing according to claim 2, is characterized in that, in described step b, described making simulation input text specifically comprises: 步骤b4:如果蒙特卡洛软件的输入是随机数,就生成实际仿真所需随机数;如果输入是程序文件,就生成实际仿真所需程序文件;Step b4: If the input of the Monte Carlo software is a random number, the random number required for the actual simulation is generated; if the input is a program file, the program file required for the actual simulation is generated; 步骤b5:根据要进行并行计算的规模,对随机数进行分组或对程序文件进行分解,每组随机数或每个子程序文件对应一个并行仿真;Step b5: according to the scale of parallel computing, group random numbers or decompose program files, each group of random numbers or each subprogram file corresponds to a parallel simulation; 步骤b6:将随机数或程序文件的路径按行写入一个文本,作为输入文件。Step b6: Write the random number or the path of the program file into a text line by line as the input file. 4.根据权利要求2所述的基于云计算的蒙特卡洛仿真加速方法,其特征在于,所述步骤b还包括:在本地计算机上运行Hadoop Streaming,调试MapReduce程序和验证仿真输入文本。4. The cloud computing-based Monte Carlo simulation acceleration method according to claim 2, wherein the step b further comprises: running Hadoop Streaming on the local computer, debugging the MapReduce program and verifying the simulation input text. 5.根据权利要求4所述的基于云计算的蒙特卡洛仿真加速方法,其特征在于,在所述步骤c中,所述配置云端所有虚拟服务器上的Hadoop运行在全分布模式具体包括:分别选择一个虚拟服务器作为Master和Secondary NameNode,其余虚拟服务器作为Worker;在本地计算机或云端任一虚拟服务器上,利用SSH通讯协议,依次依据虚拟服务器类型更改Hadoop配置文件,并将它们传输至相应的虚拟服务器上,替换原位置的配置文件;在Master虚拟服务器上进行Hadoop初始化操作,使Hadoop运行在全分布模式,形成Hadoop集群。5. The Monte Carlo simulation acceleration method based on cloud computing according to claim 4, characterized in that, in the step c, the Hadoop configuration on all virtual servers in the cloud to run in a fully distributed mode specifically includes: respectively: Select one virtual server as the Master and Secondary NameNode, and the other virtual servers as Workers; on the local computer or any virtual server in the cloud, use the SSH communication protocol to change the Hadoop configuration files in turn according to the type of virtual server, and transfer them to the corresponding virtual server. On the server, replace the configuration file in the original location; perform the Hadoop initialization operation on the Master virtual server, so that Hadoop runs in a fully distributed mode to form a Hadoop cluster. 6.根据权利要求5所述的基于云计算的蒙特卡洛仿真加速方法,其特征在于,在所述步骤d中,所述在云端Hadoop集群上运行MapReduce,进行蒙特卡洛仿真的分布式计算具体为:运行Hadoop Streaming作业程序,MapReduce自动将map程序和reduce程序运行在不同的Worker虚拟服务器上,形成Map任务和Reduce任务;在Map任务中,map程序实现蒙特卡洛仿真任务的读取、仿真计算、中间结果输出;在Reduce任务中,reduce程序实现中间结果的读取、结果的合并及输出;通过Hadoop提供的监控页面对集群运行状态进行监控。6. The method for accelerating Monte Carlo simulation based on cloud computing according to claim 5, characterized in that, in the step d, MapReduce is run on the cloud Hadoop cluster to perform distributed computing of Monte Carlo simulation Specifically: running the Hadoop Streaming job program, MapReduce automatically runs the map program and the reduce program on different Worker virtual servers to form a Map task and a Reduce task; in the Map task, the map program realizes the reading, Simulation calculation and output of intermediate results; in the Reduce task, the reduce program realizes the reading of intermediate results, merging and output of results; the cluster running status is monitored through the monitoring page provided by Hadoop. 7.根据权利要求1至6任一项所述的基于云计算的蒙特卡洛仿真加速方法,其特征在于,所述步骤d还包括:仿真计算结束后,将云端仿真结果下载至本地计算机。7. The cloud computing-based Monte Carlo simulation acceleration method according to any one of claims 1 to 6, wherein the step d further comprises: downloading the cloud simulation result to the local computer after the simulation calculation is completed. 8.一种基于云计算的蒙特卡洛仿真加速系统,其特征在于,包括模式配置模块、函数编写模块、文本制作模块、镜像制作模块、集群配置模块、数据传输模块和仿真计算模块;8. A Monte Carlo simulation acceleration system based on cloud computing, characterized in that it comprises a mode configuration module, a function writing module, a text production module, an image production module, a cluster configuration module, a data transmission module and a simulation calculation module; 所述模式配置模块用于在本地计算机上安装Hadoop和蒙特卡洛软件,配置Hadoop运行在伪分布模式下;The mode configuration module is used to install Hadoop and Monte Carlo software on the local computer, and configure Hadoop to run under pseudo-distributed mode; 所述函数编写模块用于在本地计算机上编写用于蒙特卡洛仿真的MapReduce程序;The function writing module is used to write a MapReduce program for Monte Carlo simulation on the local computer; 所述文本制作模块用于在本地计算机上制作仿真输入文本;The text making module is used to make simulated input text on the local computer; 所述镜像制作模块用于在云端制作安装有Hadoop和蒙特卡洛软件的机器镜像,并利用制作的机器镜像实例化一定数量的虚拟服务器;The image making module is used to make a machine image installed with Hadoop and Monte Carlo software in the cloud, and use the made machine image to instantiate a certain number of virtual servers; 所述集群配置模块用于配置云端所有虚拟服务器上的Hadoop运行在全分布模式,形成Hadoop集群;The cluster configuration module is used to configure Hadoop on all virtual servers in the cloud to run in a fully distributed mode to form a Hadoop cluster; 所述数据传输模块用于将本地计算机的MapReduce程序和仿真输入文本上传至虚拟服务器;The data transmission module is used for uploading the MapReduce program and simulation input text of the local computer to the virtual server; 所述仿真计算模块用于在云端Hadoop集群上运行MapReduce,进行蒙特卡洛仿真的分布式计算。The simulation computing module is used to run MapReduce on the cloud Hadoop cluster to perform distributed computing of Monte Carlo simulation. 9.根据权利要求8所述的基于云计算的蒙特卡洛仿真加速系统,其特征在于,还包括函数调试模块,所述函数调试模块用于在本地计算机上运行Hadoop Streaming,调试MapReduce程序和验证仿真输入文本。9. The cloud computing-based Monte Carlo simulation acceleration system according to claim 8, further comprising a function debugging module, the function debugging module is used to run Hadoop Streaming on a local computer, debug MapReduce programs and verify Simulate input text. 10.根据权利要求9所述的基于云计算的蒙特卡洛仿真加速系统 ,其特征在于,还包括数据下载模块,所述数据下载模块用于在仿真计算结束后,将云端仿真结果下载至本地计算机。10. The cloud computing-based Monte Carlo simulation acceleration system according to claim 9, further comprising a data download module, which is used to download the cloud simulation result to the local after the simulation calculation is completed computer.
CN201510885304.5A 2015-12-05 2015-12-05 A cloud computing-based Monte Carlo simulation acceleration method and system Active CN105335215B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510885304.5A CN105335215B (en) 2015-12-05 2015-12-05 A cloud computing-based Monte Carlo simulation acceleration method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510885304.5A CN105335215B (en) 2015-12-05 2015-12-05 A cloud computing-based Monte Carlo simulation acceleration method and system

Publications (2)

Publication Number Publication Date
CN105335215A CN105335215A (en) 2016-02-17
CN105335215B true CN105335215B (en) 2019-02-05

Family

ID=55285774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510885304.5A Active CN105335215B (en) 2015-12-05 2015-12-05 A cloud computing-based Monte Carlo simulation acceleration method and system

Country Status (1)

Country Link
CN (1) CN105335215B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740573B (en) * 2016-03-02 2019-10-11 苏州网颢信息科技有限公司 A kind of two-step Monte-carlo Simulation Method calculated for dose radiation
CN107172650B (en) * 2016-03-08 2022-03-25 中兴通讯股份有限公司 A simulation method and system for a large-scale complex wireless communication system
CN105933154A (en) * 2016-04-28 2016-09-07 安徽四创电子股份有限公司 Management method of cloud calculation resources
CN106951324B (en) * 2017-03-10 2021-03-02 广东恒聚医疗科技有限公司 Parallel operation system and method for rapid FLUKA simulation
US10147103B2 (en) 2017-03-24 2018-12-04 International Business Machines Corproation System and method for a scalable recommender system using massively parallel processors
CN109729121B (en) * 2017-10-31 2022-05-06 阿里巴巴集团控股有限公司 Cloud storage system and method for realizing custom data processing in cloud storage system
CN110302475B (en) * 2018-03-20 2021-02-19 北京连心医疗科技有限公司 Cloud Monte Carlo dose verification analysis method, equipment and storage medium
US10928297B2 (en) 2019-01-09 2021-02-23 University Of Washington Method for determining detection angle of optical particle sizer
CN109978171B (en) * 2019-02-26 2023-10-10 南京航空航天大学 Grover quantum simulation algorithm optimization method based on cloud computing
CN111724451B (en) * 2020-06-09 2024-07-16 中国科学院苏州生物医学工程技术研究所 Tomographic image reconstruction acceleration method, system, terminal and storage medium based on cloud computing
CN112001108B (en) * 2020-07-08 2024-02-02 中国人民解放军战略支援部队信息工程大学 Cone beam CT Monte Carlo simulation cluster parallel acceleration method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130238621A1 (en) * 2012-03-06 2013-09-12 Microsoft Corporation Entity Augmentation Service from Latent Relational Data
CN103488775A (en) * 2013-09-29 2014-01-01 中国科学院信息工程研究所 Computing system and computing method for big data processing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130238621A1 (en) * 2012-03-06 2013-09-12 Microsoft Corporation Entity Augmentation Service from Latent Relational Data
CN103488775A (en) * 2013-09-29 2014-01-01 中国科学院信息工程研究所 Computing system and computing method for big data processing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于Hadoop云平台的并行数据挖掘方法";杨来等;《系统仿真学报》;20130531;第25卷(第5期);全文 *

Also Published As

Publication number Publication date
CN105335215A (en) 2016-02-17

Similar Documents

Publication Publication Date Title
CN105335215B (en) A cloud computing-based Monte Carlo simulation acceleration method and system
Dreher et al. A flexible framework for asynchronous in situ and in transit analytics for scientific simulations
Cabarle et al. A spiking neural P system simulator based on CUDA
Huang et al. Blockemulator: An emulator enabling to test blockchain sharding protocols
Valencia-Cabrera et al. Simulation challenges in membrane computing
CN106168993A (en) Electrical network real-time simulation analysis platform
Li et al. SGL: towards a bridging model for heterogeneous hierarchical platforms
Sahebi et al. Distributed large-scale graph processing on FPGAs
Wei et al. LICOM3-CUDA: a GPU version of LASG/IAP climate system ocean model version 3 based on CUDA
Mei et al. Helix: Serving large language models over heterogeneous gpus and network via max-flow
Han et al. Bigdatabench-mt: A benchmark tool for generating realistic mixed data center workloads
Ono et al. Data centric framework for large-scale high-performance parallel computation
DeRose et al. Relative debugging for a highly parallel hybrid computer system
He et al. Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system
Ma et al. DVM: Towards a datacenter-scale virtual machine
Soytürk et al. Monitoring collective communication among GPUs
Walker et al. Composing and executing parallel data-flow graphs with shell pipes
Li et al. Research and application on cloud simulation
Wu et al. Parallel artificial neural network using CUDA-enabled GPU for extracting hydraulic domain knowledge of large water distribution systems
Hammond et al. Predictive simulation of HPC applications
Zehe Cloud Simulation for Large-Scale Agent-Based Traffic Simulations
Krol et al. Solving PDEs in modern multiphysics simulation software
Wang [Retracted] Heterogeneous Cluster Application Communication Optimization and Computer Big Data Management
Tadvin et al. HELICSAuto: Automating the Development of Cyber-Physical Co-Simulation Framework for Smart Grids
Tennander Energy Consumption of Micro Frontends: A comparison of micro frontends and single-page applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant