CN105335215B

CN105335215B - A cloud computing-based Monte Carlo simulation acceleration method and system

Info

Publication number: CN105335215B
Application number: CN201510885304.5A
Authority: CN
Inventors: 刘仰川; 高欣
Original assignee: Suzhou Institute of Biomedical Engineering and Technology of CAS
Current assignee: Suzhou Institute of Biomedical Engineering and Technology of CAS
Priority date: 2015-12-05
Filing date: 2015-12-05
Publication date: 2019-02-05
Anticipated expiration: 2035-12-05
Also published as: CN105335215A

Abstract

The present invention relates to a kind of Monte Carlo simulation accelerated method and system based on cloud computing, comprising: install Hadoop and Monte Carlo software on the local computer, and configure Hadoop and operate in pseudo- distribution pattern；MapReduce program is write on the local computer, and makes emulation input text；Fabrication and installation have the machine images of Hadoop and Monte Carlo software beyond the clouds, instantiate a certain number of virtual servers using the machine images of production, and configure the Hadoop on all virtual servers in cloud and operate in super distributed mode, form Hadoop cluster；Local MapReduce program and emulation input text are uploaded to virtual server, and run MapReduce on Hadoop cluster beyond the clouds, carries out the distributed computing of Monte Carlo simulation.The present invention can flexible choice virtual server quantity and configuration, each accessible network place can be used.

Description

A kind of Monte Carlo simulation accelerated method and system based on cloud computing

Technical field

The invention belongs to field of cloud computer technology more particularly to a kind of Monte Carlo simulation accelerated methods based on cloud computing And system.

Background technique

Monte Carlo (Monte Carlo, MC) method is also known as random sampling skill or statistical test method, with prevailing value Calculation method has very big difference, it is a kind of method based on Probability Statistics Theory.Monte carlo method, which can compare, forces The characteristics of very describing things and Physical Experiment process solve the problems, such as that some numerical methods are insoluble, thus this method has Be widely applied field.

The advantages of monte carlo method includes: that can more realistically describe the things with random nature and physics Experimentation；It is limited by geometrical condition small；Convergence rate is unrelated with the dimension of problem；With calculating multiple schemes and multiple simultaneously The ability of unknown quantity；Error is easy to determine；Program structure is simple, it is easy to accomplish.These advantages make its application range increasingly Extensively.The main application of this method include: PARTICLE TRANSPORT FROM problem, Statistical Physics, typical mathematical problem, vacuum technique, laser technology, Medicine, biology, mine locating etc..Wherein in PARTICLE TRANSPORT FROM problem application mainly cover In Experimental Nuclear Physics, reactor physics, High-energy physics etc..

The shortcomings that monte carlo method, is also than more prominent, including convergence rate is slow；Error has probability；In PARTICLE TRANSPORT FROM In problem, calculated result is related with system size.When solving the problems, such as more complex (or model) using Monte Carlo simulation, convergence Slow-footed problem seems especially prominent.In addition, in order to improve simulation accuracy (i.e. with really solve approximation ratio), Monte Carlo Method requires the quantity of random number to reach million or even ten million rank.Slower convergence rate and huge random number quantity make The calculation amount for obtaining Monte Carlo simulation surges, time-consuming serious, which has limited this method in the high field of requirement of real-time (as radiated Treatment planning) in application.

In monte carlo method, since the random process of each random number (or particle) in a model is independent mutually, can To carry out parallel computation.General each Monte Carlo program all has parallel computation characteristic, i.e., decomposes calculating task, and obtaining can be with The subtask individually calculated gives and obtains son after computing unit is calculated as a result, carrying out the merging of result again.Monte Carlo is imitative True accelerated method is carried out according to this characteristic.

Existing Monte Carlo simulation accelerated method includes:

(1) the Monte Carlo simulation accelerated method based on CPU cluster

In high-performance computing sector, CPU cluster calculating method is carried out more early.The supercomputer of domestic contrast maturation is most Using hundreds of CPU array.MPI is most important multiple programming realization means at present, it defines one group with removable The programming interface of plant property, therefore programmer only needs to design parallel algorithm, calls the correlation function in MPI library that can realize Operation on multiple computing units.Such as the Lu Yune of Institutes Of Technology Of Changsha, " microcomputer cluster system is based in the paper that it is delivered MPI parallel computation " in, author realizes the acceleration that PI value is sought using Monte Carlo integration method by MPI parallel Programming It calculates.Experiment shows: parallel Programming solve using multicomputer cluster the calculating speed for relatively using single computer It is fast.The for another example Fu You of University Of Science and Technology Of Shandong, in the paper that it is delivered, " low density gas Straight simulation monte carlo method is interactive Parallelizingsystem research and realization " in, 8 node group systems are used, low density gas Straight simulation Monte Carlo side is realized Method interactive parallelization calculates.Author will be converted into the parallel source under MPI environment based on 77 serial D SMC source program of Fortran Program achieves preferable acceleration effect.

(2) it is based on the Monte Carlo simulation accelerated method of GPU (cluster)

GPU full name in English is Graphic Processing Unit, and translator of Chinese is " graphics processor ".GPU is equivalent to " brain " of video card is special image core processor.The maximum characteristic of GPU is exactly to possess powerful operational capability, even It is eager to excel than the operational capability of CPU many times larger.GPU free time is effectively utilized, the potential for playing GPU becomes industry concern Focus, that is, so-called " GPU hardware acceleration ".High speed, concurrency and the programmable functions of GPU are other than image procossing General-purpose computations provide good operation platform, make GPU algebraic manipulation, Solving Partial Differential Equations and in terms of There is preferable application.CUDA (Compute Unified Device Architecture) is that video card manufacturer NVIDIA is released A kind of universal parallel computing architecture, at present using wide.CUDA is contained inside instruction set architecture (ISA) and GPU C language can be used to write program in parallel computation engine, developer, can be on the GPU for supporting CUDA with very-high performance Operation.Such as the national great waves doctor of the Central China University of Science and Technology, in its doctor's thesis " steady-state fluorescence molecular tomography restructing algorithm Theoretical and experimental study " in, propose it is a kind of using GPU cluster accelerate the steady-state fluorescence molecule based on monte carlo method at As (Fluorescence Molecular Tomography, FMT) restructing algorithm.This method utilizes 3 outfits in local area network The computer of NVIDIA video card sets up GPU cluster by Message Passing Interface (MPI) technology, will be total Calculating task is evenly distributed in 3 calculate nodes, realizes multiple GPU parallel computations.Wherein 3 computers are equipped with NVIDIA video card is G200, and GPU program is write based on CUDA.Author is accelerated using GPU cluster, is solved the side MC Method achieves preferable acceleration effect for huge time loss problem present in FMT reconstruct.The for another example Central China University of Science and Technology He Yongxiang is realized in the paper " the efficient GPU parallel computation of aerodynamics direct simulation Monte Carlo " that it is delivered The parallel computation of the direct Monte Carlo emulation of aerodynamics based on CUDA.Author uses NVIDA Tesla C2075 GPU has carried out the Acceleration study of single GPU, double GPU and more GPU, and opposite CPU is calculated, and achieves good acceleration effect, and guarantee Computational accuracy.

Existing Monte Carlo simulation accelerated method suffers a disadvantage in that

(1) when accelerating program parallel using language developments GPU such as CUDA, developer needs to have parallel thinking, exploitation Program be difficult to debug and to continue to optimize and could obtain preferable acceleration effect.

(2) it purchases supercomputer and realizes CPU cluster, it is with high costs；And the Small-sized C PU collection built using multiple computers Group, computing capability are limited.Carry out parallel program development using MPI technology, need voluntarily to carry out memory, thread management, difficulty compared with Greatly.In addition, local computer cluster needs constantly maintenance, increase personnel cost.

Summary of the invention

The present invention provides a kind of Monte Carlo simulation accelerated method and system based on cloud computing, it is intended at least certain One of above-mentioned technical problem in the prior art is solved in degree.

Implementation of the present invention is as follows, a kind of Monte Carlo simulation accelerated method based on cloud computing, comprising the following steps:

Step a: Hadoop and Monte Carlo software are installed on the local computer, and configures Hadoop and operates in pseudo- distribution Mode；

Step b: writing MapReduce program on the local computer, and makes emulation input text；

Step c: fabrication and installation have the machine images of Hadoop and Monte Carlo software beyond the clouds, utilize the machine mirror of production As instantiating a certain number of virtual servers, and the Hadoop configured on all virtual servers in cloud operates in super distributed mould Formula forms Hadoop cluster；

Step d: local MapReduce program and emulation input text are uploaded to virtual server, and beyond the clouds MapReduce is run on Hadoop cluster, carries out the distributed computing of Monte Carlo simulation.

The technical solution that the embodiment of the present invention is taken further include: described to write MapReduce program in the step b It specifically includes:

Step b1: writing map program, and program successively includes inputting the simulation calculation task that reads from standard, calling Meng Teka Lip river program carries out simulation calculation, standard output is written in calculated result in the form of key-value pair；

Step b2: writing reduce program, and program successively includes reading the simulation calculation with same keys from standard input As a result, calculated result is merged, standard output is written into combined result in the form of key-value pair；

Step b3: writing Hadoop Streaming operation procedure, and program includes the input and output of map and reduce program Stream format, Map and Reduce number of tasks, input text title, outgoing route, mapper and reducer filename and upload text Part path.

The technical solution that the embodiment of the present invention is taken further include: in the step b, the production emulation input text tool Body includes:

Step b4: if the input of Monte Carlo software is random number, random number needed for just generating actual emulation；If defeated Entering is program file, program file needed for just generating actual emulation；

Step b5: according to the scale of parallel computation to be carried out, being grouped random number or decompose to program file, Every group of random number or the corresponding parallel artificial of each subprogram file；

Step b6: the path of random number or program file is entered into a text by row write, as input file.

The technical solution that the embodiment of the present invention is taken further include: the step b further include: run on the local computer Hadoop Streaming, debugging MapReduce program and verifying emulation input text.

The technical solution that the embodiment of the present invention is taken further include: in the step c, all virtual clothes in configuration cloud Hadoop on business device operates in super distributed mode and specifically includes: select respectively one virtual server as Master with Secondary NameNode, remaining virtual server is as Worker；In any virtual server of local computer or cloud On, using SSH communications protocol, successively according to virtual server Type Change Hadoop configuration file, and transmit them to phase On the virtual server answered, the configuration file of original position is replaced；Hadoop initialization behaviour is carried out on Master virtual server Make, Hadoop is made to operate in complete distribution pattern, forms Hadoop cluster.

The technical solution that the embodiment of the present invention is taken further include: in the step d, the operation cloud MapReduce Carry out the distributed computing of Monte Carlo simulation specifically: operation Hadoop Streaming operation procedure, MapReduce are automatic Map program and reduce program are operated on different Worker virtual servers, forms Map task and Reduce task； In Map task, map program realizes reading, simulation calculation, the intermediate result output of Monte Carlo simulation task；In Reduce In task, the reading of result, the merging of result and output between the realization of reduce program；The monitoring page provided by Hadoop Cluster operating status is monitored.

The technical solution that the embodiment of the present invention is taken further include: the step d further include: after simulation calculation, by cloud End simulation result is downloaded to local computer.

A kind of another technical solution that the embodiment of the present invention is taken are as follows: Monte Carlo simulation acceleration system based on cloud computing System, including pattern configurations module, function write module, text writing module, cluster configuration module, data transmission module and emulation Computing module；

The pattern configurations module configures Hadoop for installing Hadoop and Monte Carlo software on the local computer It operates under pseudo- distribution pattern；

The function writes module for writing the MapReduce journey for Monte Carlo simulation on the local computer Sequence；

The text writing module is for production emulation input text on the local computer；

The mirror image production module has the machine images of Hadoop and Monte Carlo software for fabrication and installation beyond the clouds, and A certain number of virtual servers are instantiated using the machine images of production；

The Hadoop that the cluster configuration module is used to configure on all virtual servers in cloud operates in super distributed mode, Form Hadoop cluster；

The data transmission module is used to the MapReduce program of local computer and emulation input text being uploaded to void Quasi- server；

The simulation algorithm model runs MapReduce on Hadoop cluster beyond the clouds, carries out Monte Carlo simulation Distributed computing.

The technical solution that the embodiment of the present invention is taken further include: it further include function debugging module, the function debugging module For running Hadoop Streaming, debugging MapReduce program and verifying emulation input text on the local computer.

The technical solution that the embodiment of the present invention is taken further include: it further include data download module, the data download module For after simulation calculation, cloud simulation result to be downloaded to local computer.

What the Monte Carlo simulation accelerated method based on cloud computing and system of the embodiment of the present invention were provided using cloud platform Virtual server builds Hadoop cluster, and MapReduce frame is relied on to realize distributed computing, and user only needs customized The processes such as the calling to Monte Carlo program, intermediate result processing are realized in map, reduce function, in Hadoop cluster Operation can be realized the emulation of the Monte Carlo based on distributed computing framework MapReduce and accelerate.The present invention is real beyond the clouds Existing calculating, can be with the quantity and configuration of flexible choice virtual server, and the mode charged on time guarantees that cost is controllable, and every The place of a accessible network can be used.

Detailed description of the invention

Fig. 1 is the flow chart of the Monte Carlo simulation accelerated method based on cloud computing of the embodiment of the present invention；

Fig. 2 is the flow chart of the method for writing MapReduce program on the local computer of the embodiment of the present invention；

Fig. 3 is the flow chart of the method for the emulation input text of production on the local computer of the embodiment of the present invention；

Fig. 4 is cloud computing service form schematic diagram；

Fig. 5 is the structural schematic diagram of the Monte Carlo simulation acceleration system based on cloud computing of the embodiment of the present invention.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

Referring to Fig. 1, being the flow chart of the Monte Carlo simulation accelerated method based on cloud computing of the embodiment of the present invention.This The Monte Carlo simulation accelerated methods based on cloud computing of inventive embodiments the following steps are included:

Step 100: Hadoop and Monte Carlo software being installed on the local computer, configuration Hadoop operates in pseudo- distribution Under mode；

In step 100, Hadoop is the distributed computing architecture developed by Apache foundation, is a kind of open source Software.User can develop distributed program using Hadoop, make full use of without understanding the details of the distributed bottom layer The power of cluster carries out high speed computing and storage.The core of Hadoop includes HDFS (Hadoop Distributed File System, Hadoop distributed file system) and MapReduce frame, HDFS provide storage, MapReduce for mass data Frame provides calculating for mass data.

Hadoop includes three kinds of operational modes:

(1) single cpu mode (Standalone Mode)

Single cpu mode is the default mode of Hadoop, and Hadoop can operate in local completely.Because not needing to save with other Point interaction, single cpu mode just do not use HDFS, are not loaded with the finger daemon of any Hadoop yet.The mode is mainly used for exploitation and adjusts Try the application logic of MapReduce frame.

(2) pseudo- distribution pattern (Pseudo-Distributed Mode)

Pseudo- distribution pattern is that Hadoop is operated on " single node cluster ", wherein all finger daemons all operate in it is same On platform machine, HDFS is also to establish in local file system.The mode increases code debugging function on single cpu mode, You is allowed to check memory service condition, HDFS input and output and other finger daemon interactions.It has debugged in this mode Program can operate in without modification under super distributed mode.

(3) super distributed mode (Fully Distributed Mode)

Hadoop finger daemon operates on a cluster, realizes real distributed computing and storage.Wherein guard into On different machines, HDFS is also to establish on different machines to Cheng Yunhang.There are three types of role, master for machine in cluster Machine is responsible for the scheduling of task, and Secondary NameNode machine is responsible for the backup to critical data on master machine, and Worker machine undertakes data processing task.

Step 200: writing the MapReduce program for Monte Carlo simulation on the local computer；

It is writing on the local computer for the embodiment of the present invention referring to Figure 2 together to clearly illustrate step 200 The flow chart of the method for MapReduce program.The side for writing MapReduce program on the local computer of the embodiment of the present invention Method the following steps are included:

Step 201: writing map program, program successively includes reading simulation calculation task from standard input (stdin), adjusting Simulation calculation is carried out with Monte Carlo program, marks calculated result with the form write-in of key-value pair (Key-Value-Pair, KVP) Quasi- output (stdout)；

Step 202: writing reduce program, program successively includes that the emulation meter with same keys (Key) is read from stdin Result is calculated, calculated result is merged, stdout is written into combined result in the form of KVP；

Step 203: writing Hadoop Streaming operation procedure, program includes that the input of map and reduce program is defeated Out flow (stream) format, Map and Reduce task (task) number, input text title, outgoing route, mapper and Reducer filename, upload file path etc..

Step 300: production emulation input text on the local computer；

It is the production on the local computer of the embodiment of the present invention also referring to Fig. 3 to clearly illustrate step 300 The flow chart of the method for emulation input text.The method of the emulation input text of production on the local computer of the embodiment of the present invention The following steps are included:

Step 301: if the input of Monte Carlo software is random number, random number needed for just generating actual emulation；If Input is program file, program file needed for just generating actual emulation；

Step 302: according to the scale (Thread Count) of parallel computation to be carried out, random number being grouped or to program file It is decomposed, every group of random number or the corresponding parallel artificial of each subprogram file；

Step 303: the path of random number or program file being entered into (text) text by row write, as input file.

Step 400: running Hadoop Streaming on the local computer, debugging MapReduce program and verifying are imitative True input text；

In step 400, when realizing distributed computing using Hadoop, need to develop MapReduce application program.It is most simple Single MapReudce application program contains at least one map function, a reduce function and a main function.Map and The general format that reduce function follows is:

map:(k1,v1)→list(k2,v2)

reduce:(k2,list(v2))→list(k3,v3)

Wherein, each member that map function receives one group of data and is converted into a key/value to list, in input domain The corresponding key/value pair of element；Reduce function receives the list of map function generation, then (raw for each key according to their key At a key/value to) reduce key/value to list.

During the entire process of writing map and reduce function, input data is to come from bottom distributed file system HDFS, Intermediate data is placed on local file system, and final output data are write-in bottom distributed file system HDFS.

Step 500: fabrication and installation have the machine images of Hadoop and Monte Carlo software beyond the clouds, according to calculate needs, Selected hardware configuration instantiates a certain number of virtual servers using the machine images of production；

In step 500, cloud computing (Cloud Computing) is grid computing (Grid Computing), distribution Calculate (Distributed Computing), parallel computation (Parallel Computing), effectiveness calculating (Utility Computing), network storage (Network Storage Technologies), virtualization (Virtualization), load The product of traditional computers technologies such as balanced (Load Balance) and network technical development fusion.It is intended to through network more The computational entity of a advantage of lower cost is integrated into the perfect system with powerful calculating ability, services by software (SaaS), platform services (PaaS) and basis instrument and services advanced business models such as (IaaS) powerful computing capability It is distributed in terminal user's hand.Currently, the main services form of cloud computing include: SaaS (Software as aService, it is soft Part services), PaaS (Platform as a Service, platform i.e. service) and IaaS (Infrastructure as a Service, infrastructure services), specifically as shown in figure 4, being cloud computing service form schematic diagram.The present invention is in cloud platform IaaS builds Hadoop cluster, and the distributed computing of Monte Carlo simulation is realized using the MapReduce frame in Hadoop. IaaS is supplied to client as metering service i.e. " cloud " infrastructure being made of multiple servers.It is by memory, I/O Equipment, storage and computing capability are integrated into storage resource required for a virtual resource pool provides for entire industry and virtual Change the service such as server.The cloud service provider of IaaS type can provide the host of various configurations, and user is it is required to determine that machine Device mirror image and host hardware configuration.

Machine images provided by cloud service provider are not able to satisfy exploitation demand generally, need further configuration surroundings. The machine images that the embodiment of the present invention makes beyond the clouds need to install Monte Carlo and Hadoop software, and with local computing equipment There are identical configuration surroundings, the program debugged on the local computer in this way can directly be run beyond the clouds.Utilize production Machine images can instantiate any number of virtual server (also known as " node " or " example instance "), avoid to big Measure the cumbersome of the virtual server configuration surroundings one by one of instantiation.

Step 600: the Hadoop on the configuration all virtual servers in cloud operates in super distributed mode, forms Hadoop collection Group；

In step 600, the Hadoop on the configuration all virtual servers in cloud operates in super distributed mode and specifically includes: Select a node as Master and Secondary NameNode respectively, remaining node is as Worker；In local computing On machine or cloud any node, using SSH communications protocol, successively change Hadoop configuration file according to node type, and by it Be transmitted on corresponding node, replace the configuration file of original position；Hadoop initialization operation is carried out on Master node, So that Hadoop is operated in complete distribution pattern, forms Hadoop cluster.

Step 700: the MapReduce program of local computer and emulation input text are uploaded to Master Virtual Service Device；

Step 800: running MapReduce on Hadoop cluster beyond the clouds, carry out the distributed computing of Monte Carlo simulation；

In step 800, the operation cloud MapReduce carries out the distributed computing of Monte Carlo simulation specifically: Hadoop Streaming operation procedure is run, MapReduce automatically operates in map program and reduce program different On Worker node, Map task and Reduce task are formed；In Map task, map program realizes Monte Carlo simulation task Reading, simulation calculation, intermediate result output；In Reduce task, the reading of result, result between the realization of reduce program Merging and output；Cluster operating status is monitored by the monitoring page that Hadoop is provided.

Step 900: after simulation calculation, cloud simulation result being downloaded to local computer.

Referring to Fig. 5, being the structural representation of the Monte Carlo simulation acceleration system based on cloud computing of the embodiment of the present invention Figure.The Monte Carlo simulation acceleration system based on cloud computing of the embodiment of the present invention includes that pattern configurations module, function write mould Block, text writing module, function debugging module, mirror image make module, cluster configuration module, data transmission module, simulation calculation Module and data download module；

Pattern configurations module for installing Hadoop and Monte Carlo software, configuration Hadoop operation on the local computer Under pseudo- distribution pattern；

Function writes module for writing the MapReduce program for Monte Carlo simulation on the local computer；Its In, it includes: to write map program that function, which writes the method that module writes MapReduce program, and program successively includes inputting from standard (stdin) read simulation calculation task, call Monte Carlo program carry out simulation calculation, by calculated result with key-value pair (Key- Value-Pair, KVP) form be written standard output (stdout)；Reduce program is write, program successively includes from stdin Read the simulation result with same keys (Key), calculated result merged, by combined result in the form of KVP Stdout is written；Hadoop Streaming operation procedure is write, program includes the iostream of map and reduce program (stream) format, Map and Reduce task (task) number, input text title, outgoing route, mapper and reducer text Part name, upload file path etc..

Text writing module is for production emulation input text on the local computer；Wherein, text writing module making If the input that the method for emulation input text includes: Monte Carlo software is random number, just generate random needed for actual emulation Number；If input is program file, program file needed for just generating actual emulation；According to the scale (line of parallel computation to be carried out Number of passes), random number is grouped or program file is decomposed, every group of random number or each subprogram file are one corresponding Parallel artificial；The path of random number or program file is entered into (text) text by row write, as input file.

Function debugging module debugs MapReduce program for running Hadoop Streaming on the local computer Input text is emulated with verifying；Wherein, when realizing distributed computing using Hadoop, need to develop MapReduce application program. Simplest MapReudce application program contains at least one map function, a reduce function and a main function.map The general format followed with reduce function is:

map:(k1,v1)→list(k2,v2)

reduce:(k2,list(v2))→list(k3,v3)

Mirror image production module has the machine images of Hadoop and Monte Carlo software for fabrication and installation beyond the clouds, according to meter It calculates and needs, selected hardware configuration instantiates a certain number of virtual servers using the machine images of production；

The Hadoop that cluster configuration module is used to configure on all virtual servers in cloud operates in super distributed mode, is formed Hadoop cluster；Wherein, the Hadoop on the configuration all virtual servers in cloud operates in super distributed mode and specifically includes: respectively Select a node as Master and Secondary NameNode, remaining node is as Worker；In local computer or In any node of cloud, using SSH communications protocol, Hadoop configuration file successively is changed according to node type, and they are passed It transports on corresponding node, replaces the configuration file of original position；Hadoop initialization operation is carried out on Master node, is made Hadoop operates in complete distribution pattern, forms Hadoop cluster.

Data transmission module is used to the MapReduce program of local computer and emulation input text being uploaded to Master Virtual server；

Simulation algorithm model runs MapReduce on Hadoop cluster beyond the clouds, carries out point of Monte Carlo simulation Cloth calculates；Wherein, the operation cloud MapReduce carries out the distributed computing of Monte Carlo simulation specifically: operation Map program and reduce program are operated in different Worker automatically by Hadoop Streaming operation procedure, MapReduce On node, Map task and Reduce task are formed；In Map task, the reading of map program realization Monte Carlo simulation task, Simulation calculation, intermediate result output；In Reduce task, the reading of result, the merging of result between the realization of reduce program And output；Cluster operating status is monitored by the monitoring page that Hadoop is provided.

Data download module is used for after simulation calculation, and cloud simulation result is downloaded to local computer.

What the Monte Carlo simulation accelerated method based on cloud computing and system of the embodiment of the present invention were provided using cloud platform Virtual server builds Hadoop cluster, and MapReduce frame is relied on to realize distributed computing, and user only needs customized The processes such as the calling to Monte Carlo program, intermediate result processing are realized in map, reduce function, in Hadoop cluster Operation can be realized the Monte Carlo simulation based on distributed computing framework MapReduce and accelerate.The present invention is to realize beyond the clouds Calculating, can be with the quantity and configuration of flexible choice virtual server, the mode charged on time guarantees that cost is controllable, and each The place that can access network can be used.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims

1. A cloud computing-based Monte Carlo simulation acceleration method, comprising the following steps:

Step a: Install Hadoop and Monte Carlo software on the local computer, and configure Hadoop to run in pseudo-distributed mode;

Step b: Write a MapReduce program for calling Monte Carlo software on the local computer, and make simulation input text for Monte Carlo software input;

Step c: Make a machine image installed with Hadoop and Monte Carlo software in the cloud, instantiate a certain number of virtual servers using the machine image produced, and configure Hadoop on all virtual servers in the cloud to run in a fully distributed mode to form a Hadoop cluster;

Step d: Upload the local MapReduce program and simulation input text to the virtual server, and run MapReduce on the cloud Hadoop cluster to perform distributed computing of Monte Carlo simulation.

2. The cloud computing-based Monte Carlo simulation acceleration method according to claim 1, wherein in the step b, the writing MapReduce program specifically comprises:

Step b1: write a map program, the program sequentially includes reading the simulation calculation task from the standard input, calling the Monte Carlo software to perform the simulation calculation, and writing the calculation result in the form of a key-value pair to the standard output;

Step b2: Write a reduce program, the program sequentially includes reading the simulation calculation results with the same key from the standard input, merging the calculation results, and writing the combined result in the form of a key-value pair to the standard output;

Step b3: Write a Hadoop Streaming job program, which includes the input and output stream formats of the map and reduce programs, the number of Map and Reduce tasks, the input text name, the output path, the mapper and reducer file names, and the upload file path.

3. The Monte Carlo simulation acceleration method based on cloud computing according to claim 2, is characterized in that, in described step b, described making simulation input text specifically comprises:

Step b4: If the input of the Monte Carlo software is a random number, the random number required for the actual simulation is generated; if the input is a program file, the program file required for the actual simulation is generated;

Step b5: according to the scale of parallel computing, group random numbers or decompose program files, each group of random numbers or each subprogram file corresponds to a parallel simulation;

Step b6: Write the random number or the path of the program file into a text line by line as the input file.

4. The cloud computing-based Monte Carlo simulation acceleration method according to claim 2, wherein the step b further comprises: running Hadoop Streaming on the local computer, debugging the MapReduce program and verifying the simulation input text.

5. The Monte Carlo simulation acceleration method based on cloud computing according to claim 4, characterized in that, in the step c, the Hadoop configuration on all virtual servers in the cloud to run in a fully distributed mode specifically includes: respectively: Select one virtual server as the Master and Secondary NameNode, and the other virtual servers as Workers; on the local computer or any virtual server in the cloud, use the SSH communication protocol to change the Hadoop configuration files in turn according to the type of virtual server, and transfer them to the corresponding virtual server. On the server, replace the configuration file in the original location; perform the Hadoop initialization operation on the Master virtual server, so that Hadoop runs in a fully distributed mode to form a Hadoop cluster.

6. The method for accelerating Monte Carlo simulation based on cloud computing according to claim 5, characterized in that, in the step d, MapReduce is run on the cloud Hadoop cluster to perform distributed computing of Monte Carlo simulation Specifically: running the Hadoop Streaming job program, MapReduce automatically runs the map program and the reduce program on different Worker virtual servers to form a Map task and a Reduce task; in the Map task, the map program realizes the reading, Simulation calculation and output of intermediate results; in the Reduce task, the reduce program realizes the reading of intermediate results, merging and output of results; the cluster running status is monitored through the monitoring page provided by Hadoop.

7. The cloud computing-based Monte Carlo simulation acceleration method according to any one of claims 1 to 6, wherein the step d further comprises: downloading the cloud simulation result to the local computer after the simulation calculation is completed.

8. A Monte Carlo simulation acceleration system based on cloud computing, characterized in that it comprises a mode configuration module, a function writing module, a text production module, an image production module, a cluster configuration module, a data transmission module and a simulation calculation module;

The mode configuration module is used to install Hadoop and Monte Carlo software on the local computer, and configure Hadoop to run under pseudo-distributed mode;

The function writing module is used to write a MapReduce program for Monte Carlo simulation on the local computer;

The text making module is used to make simulated input text on the local computer;

The image making module is used to make a machine image installed with Hadoop and Monte Carlo software in the cloud, and use the made machine image to instantiate a certain number of virtual servers;

The cluster configuration module is used to configure Hadoop on all virtual servers in the cloud to run in a fully distributed mode to form a Hadoop cluster;

The data transmission module is used for uploading the MapReduce program and simulation input text of the local computer to the virtual server;

The simulation computing module is used to run MapReduce on the cloud Hadoop cluster to perform distributed computing of Monte Carlo simulation.

9. The cloud computing-based Monte Carlo simulation acceleration system according to claim 8, further comprising a function debugging module, the function debugging module is used to run Hadoop Streaming on a local computer, debug MapReduce programs and verify Simulate input text.

10. The cloud computing-based Monte Carlo simulation acceleration system according to claim 9, further comprising a data download module, which is used to download the cloud simulation result to the local after the simulation calculation is completed computer.