US12327195B2 - Automated feature generation for sensor subset selection - Google Patents
Automated feature generation for sensor subset selection Download PDFInfo
- Publication number
- US12327195B2 US12327195B2 US16/209,486 US201816209486A US12327195B2 US 12327195 B2 US12327195 B2 US 12327195B2 US 201816209486 A US201816209486 A US 201816209486A US 12327195 B2 US12327195 B2 US 12327195B2
- Authority
- US
- United States
- Prior art keywords
- sensors
- independent variables
- computer programs
- population
- aircraft
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B64—AIRCRAFT; AVIATION; COSMONAUTICS
- B64D—EQUIPMENT FOR FITTING IN OR TO AIRCRAFT; FLIGHT SUITS; PARACHUTES; ARRANGEMENT OR MOUNTING OF POWER PLANTS OR PROPULSION TRANSMISSIONS IN AIRCRAFT
- B64D45/00—Aircraft indicators or protectors not otherwise provided for
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B64—AIRCRAFT; AVIATION; COSMONAUTICS
- B64F—GROUND OR AIRCRAFT-CARRIER-DECK INSTALLATIONS SPECIALLY ADAPTED FOR USE IN CONNECTION WITH AIRCRAFT; DESIGNING, MANUFACTURING, ASSEMBLING, CLEANING, MAINTAINING OR REPAIRING AIRCRAFT, NOT OTHERWISE PROVIDED FOR; HANDLING, TRANSPORTING, TESTING OR INSPECTING AIRCRAFT COMPONENTS, NOT OTHERWISE PROVIDED FOR
- B64F5/00—Designing, manufacturing, assembling, cleaning, maintaining or repairing aircraft, not otherwise provided for; Handling, transporting, testing or inspecting aircraft components, not otherwise provided for
- B64F5/60—Testing or inspecting aircraft components or systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0736—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in functional embedded systems, i.e. in a data processing system designed as a combination of hardware and software dedicated to performing a certain function
- G06F11/0739—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in functional embedded systems, i.e. in a data processing system designed as a combination of hardware and software dedicated to performing a certain function in a data processing system embedded in automotive or aircraft systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B64—AIRCRAFT; AVIATION; COSMONAUTICS
- B64D—EQUIPMENT FOR FITTING IN OR TO AIRCRAFT; FLIGHT SUITS; PARACHUTES; ARRANGEMENT OR MOUNTING OF POWER PLANTS OR PROPULSION TRANSMISSIONS IN AIRCRAFT
- B64D45/00—Aircraft indicators or protectors not otherwise provided for
- B64D2045/0085—Devices for aircraft health monitoring, e.g. monitoring flutter or vibration
Definitions
- the present disclosure relates generally to machine learning and, in particular, to automated feature generation for sensor subset selection.
- Machine learning is a process to analyze data in which the dataset is used to determine a machine learning model (also called a rule or a function) that maps input data (also called explanatory variables or predictors) to output data (also called dependent variables or response variables) according to a machine learning algorithm.
- a machine learning model also called a rule or a function
- input data also called explanatory variables or predictors
- output data also called dependent variables or response variables
- a broad array of machine learning algorithms are available, with new algorithms the subject of active research.
- One type of machine learning is supervised learning in which a model is trained with a dataset including known output data for a sufficient number of input data. Once a model is trained, it may be deployed, i.e., applied to new input data to predict the expected output.
- Machine learning may be applied to a number of different types of problems such as regression problems and classification problems.
- regression problems the output data includes numeric values such as a voltage, a pressure, a number of cycles.
- classification problems the output data includes labels, classes, categories (e.g., pass-fail, healthy-faulty, failure type, etc.) and the like.
- machine learning may be applied to classify aircraft or aircraft components as healthy or faulty from measurements of properties recorded by an airborne flight recorder, such as a quick access recorder (QAR) of an aircraft that receives its input (measurements) from sensors or avionic systems onboard the aircraft.
- QAR quick access recorder
- independent variables are measureable properties or characteristics of what is being observed, and the selection or generation of relevant features is often an integral part of machine learning.
- a large number of independent variables are observed. Because of time and computing resource requirements (processing and memory requirements in particular), it is often impractical to include all of the independent variables in the model. And there may be some independent variables that are redundant or irrelevant (uncorrelated) to the dependent (response) variable.
- This dependent variable could be a number of different variables.
- the dependent variable could be a flight deck effect or a condition of the aircraft or one or more parts of the aircraft.
- a data analyst may want to quickly and automatically down-select from thousands of sensors to just dozens of sensors that the analyst can focus on to find root cause or build a predictive maintenance model.
- the dependent variable may be measurements from one or more of the sensors themselves.
- an engineer may want to detect when a combination of sensors is able to, within some level of fidelity, recreate the measurements from another sensor.
- a multivariate approach might be to apply a machine learning technique such as a random forest to the data to use its feature importance capability. But training a random forest on all of the raw, multivariate sensor data is infeasible.
- the time series data needs to be reduced through feature extraction first before a machine learning model can be built and deployed.
- Feature extraction introduces at least two further problems. Simple feature extraction such as straight statistics of the raw time series may hide important temporal and/or interrelated behaviors present in the raw data.
- manually defining high-fidelity feature extractors is difficult and time-consuming for all the sensors and is a therefore essentially a reformulation of the original problem.
- Example implementations of the present disclosure are directed to selecting a subset of independent variables from a plurality of independent variables to predict a dependent variable.
- Example implementations applies genetic programming to a multivariate time series of independent variables.
- existing genetic programming libraries can be extended to enable their application to multivariate time series data, and a library of primitive functions can be used to allow arbitrary combinations of time series transformations in an evolving genetic programming tree.
- Example implementations also leverages multiple runs of genetic programming.
- traditional genetic programming optimization a run of genetic programming is performed to find the best individual.
- runs of genetic programming are iteratively performed to estimate the importance of sensors by an evaluation process that tracks sensor usage in conjunction with tree fitness throughout multiple independent runs of the genetic programming.
- Example implementations take into account issues with existing solutions by working directly with raw sensor data, and automatically generating, evaluating, and improving combinations of sensors. This avoids the potential loss of valuable information through arbitrary feature extraction, and allows the detection and characterization of more complex interactions between multiple sensors.
- the present disclosure thus includes, without limitation, the following example implementations.
- Some example implementations provide a method of selecting a subset of independent variables from a plurality of independent variables to predict a dependent variable, the method comprising accessing a multivariate time series including observations of data, each of the observations of data including or indicating values of the plurality of independent variables, and a value of the dependent variable.
- the method further comprises iteratively performing runs of genetic programming on groups of independent variables from the plurality of independent variables, including for an iteration of a plurality of iterations: randomly generating a population of computer programs from a group of independent variables selected from the plurality of independent variables, and primitive functions selected from a library of primitive functions, to predict the dependent variable; iteratively transforming the population of computer programs into new generations of the population of computer programs, and including sub-rankings of the group of independent variables based on a quantitative fitness of respective computer programs in the population of computer programs and the new generations of the population of computer programs to predict the dependent variable, the quantitative fitness being determined according to selected fitness criterion; and producing a ranking of the group of independent variables from the sub-rankings of the group of independent variables.
- the method further comprises producing an aggregate ranking of the plurality of independent variables from the ranking of the group of independent variables over the plurality of iterations and selecting the subset of independent variables from the aggregate ranking of the plurality of independent variables, and according to selected optimization criterion
- iteratively transforming the population of computer programs includes for a first sub-iteration of a plurality of sub-iterations: executing the population of computer programs over values of the group of independent variables for each of the observations of data to produce predictions of the dependent variable; determining the quantitative fitness of the respective computer programs in the population of computer programs according to the selected fitness criterion, and based on the predictions and the value of the dependent variable; producing a first sub-ranking of the group of independent variables based on the quantitative fitness; and generating a first new generation of the population of computer programs for a second sub-iteration of the plurality of sub-iterations, from the population of computer programs, according to an evolutionary algorithm, and based on the quantitative fitness.
- iteratively transforming the population of computer programs includes for a sub-iteration of a plurality of sub-iterations: executing a new generation of the population of computer programs from a preceding sub-iteration of the plurality of sub-iterations, over values of the group of independent variables for each of the observations of data to produce predictions of the dependent variable; determining the quantitative fitness of the respective computer programs in the new generation of the population of computer programs according to the selected fitness criterion, and based on the predictions and the value of the dependent variable; producing a sub-ranking of the group of independent variables based on the quantitative fitness; and generating a next new generation of the population of computer programs for a next sub-iteration of the plurality of sub-iterations, from the new generation of the population of computer programs, according to an evolutionary algorithm, and based on the quantitative fitness.
- the selected fitness criterion includes accuracy, correlation or error rate of predictions of the dependent variable from the respective computer programs relative to values of the dependent variable from the observations of data.
- the plurality of independent variables are a plurality of environmental conditions measurable by a plurality of sensors
- the values of the plurality of independent variables are measurements of the plurality of environmental conditions from the plurality of sensors
- the aggregate ranking of the plurality of independent variables is an aggregate ranking of the plurality of sensors
- selecting the subset of independent variables includes selecting a subset of sensors from the plurality of sensors.
- the selected optimization criterion includes a number of sensors in the subset of sensors, or one or more quantitative properties that define sensors of the plurality of sensors.
- the one or more quantitative properties that define the sensors of the plurality of sensors include cost, weight, power consumption, reliability, maintainability, or complexity of installation.
- the observations of data are observations of flight data for a plurality of flights of an aircraft, for each flight of which the measurements of the plurality of environmental conditions are measurements recorded during the flight by an airborne flight recorder from the plurality of sensors onboard the aircraft.
- the value of the dependent variable is an indication of a condition of the aircraft or one or more parts of the aircraft
- the method further comprises at least selecting the subset of sensors as a set of features for use in building a machine learning model to predict the condition of the aircraft or one or more parts of the aircraft; building the machine learning model using a machine learning algorithm, the set of features, and a training set; and outputting the machine learning model for deployment to predict and thereby produce predictions of the condition of the aircraft or one or more parts of the aircraft for additional observations of the flight data that exclude the indication of the condition of the aircraft or one or more parts of the aircraft.
- Some example implementations provide an apparatus for selecting a subset of independent variables from a plurality of independent variables to predict a dependent variable.
- the apparatus comprises a memory configured to store computer-readable program code; and processing circuitry configured to access the memory, and execute the computer-readable program code to cause the apparatus to at least perform the method of any preceding example implementation, or any combination of any preceding example implementations.
- Some example implementations provide a computer-readable storage medium for selecting a subset of independent variables from a plurality of independent variables to predict a dependent variable.
- the computer-readable storage medium is non-transitory and has computer-readable program code stored therein that in response to execution by processing circuitry, causes an apparatus to at least perform the method of any preceding example implementation, or any combination of any preceding example implementations.
- FIG. 1 illustrates a system for selecting a subset of independent variables from a plurality of independent variables to predict a dependent variable, according to example implementations of the present disclosure
- FIG. 2 is a flowchart illustrating various steps in a method of selecting a subset of independent variables from a plurality of independent variables to predict a dependent variable, according to example implementations;
- FIG. 3 is a flowchart illustrating in greater detail various steps in a method of selecting a subset of independent variables from a plurality of independent variables to predict a dependent variable, according to example implementations.
- FIG. 4 illustrates an apparatus according to some example implementations.
- Example implementations of the present disclosure are directed to selecting a subset of independent variables from a plurality of independent variables to predict a dependent variable.
- Example implementations will be primarily described in the context of automated feature generation for selection of a subset of sensors of an aircraft or other vehicle or manufactured system. It should be understood that example implementation may be applied in a number of contexts, some of which are described in greater detail below.
- example implementations assume a plurality of sensors generating multivariate time series data, a library of primitive functions, and selected optimization criteria.
- example implementations use genetic programming to evolve computer programs that apply combinations of primitive functions to subsets of sensor data (these computer programs at times are referred to as “feature extractors”).
- feature extractors use genetic programming to evolve computer programs that apply combinations of primitive functions to subsets of sensor data (these computer programs at times are referred to as “feature extractors”).
- feature extractors track and aggregate the use and fitness of individual sensors over multiple independent runs of genetic programming to estimate the importance of each sensor. The list of sensors ranked by importance may then identify subsets that best meet the optimization criterion.
- FIG. 1 illustrates a system 100 for selecting a subset of independent variables from a plurality of independent variables to predict a dependent variable, according to example implementations of the present disclosure.
- the system may include any of a number of different subsystems (each an individual system) for performing one or more functions or operations.
- the system includes at least one source 102 of data, an input 104 , and a genetic programming (GP) engine 106 with a population generator 108 , an evolution engine 110 and a ranking engine 112 .
- GP genetic programming
- the system includes an aggregate ranking engine 114 and a subset selection engine 116 .
- the subsystems including the source 102 , input 104 , GP engine 106 , aggregate ranking engine 114 and a subset selection engine 116 may be co-located or directly coupled to one another, or in some examples, various ones of the subsystems may communicate with one another across one or more computer networks 118 . Further, although shown as part of the system 100 , it should be understood that any one or more of the above may function or operate as a separate system without regard to any of the other subsystems. It should also be understood that the system may include one or more additional or alternative subsystems than those shown in FIG. 1 .
- a source 102 is a source of data.
- the source 102 includes a memory that may be located at a single source or distributed across multiple sources.
- the memory may store data such as a multivariate time series including observations of data, with each of the observations of data including or indicating values of the plurality of independent variables, and a value of the dependent variable. These observations may be considered labeled time series data (values of the independent variables labeled with a value of the dependent variable).
- the data may be stored in a number of different manners, such as in a database or flat files of any of a number of different types or formats.
- the plurality of independent variables are a plurality of environmental conditions measurable by a plurality of sensors, and the values of the plurality of independent variables are measurements of the plurality of environmental conditions from the plurality of sensors.
- the observations of data are observations of flight data for plurality of flights of an aircraft.
- the measurements of the plurality of environmental conditions are measurements recorded during the flight by an airborne flight recorder from the plurality of sensors onboard the aircraft.
- the value of the dependent variable is an indication of a condition of the aircraft or one or more parts of the aircraft (e.g., pass-fail, healthy-faulty, failure type, etc.).
- the subset of sensors is selected as a set of features for use in building a machine learning model to predict the condition of the aircraft or one or more parts of the aircraft.
- the machine learning model is built using a machine learning algorithm, the set of features, and a training set. And the machine learning model is output for deployment to predict and thereby produce predictions of the condition of the aircraft or one or more parts of the aircraft for additional observations of the flight data that exclude the indication of the condition of the aircraft or one or more parts of the aircraft.
- a prediction of the condition of the aircraft or part(s) of the aircraft may indicate an impending fault or failure, and cause an alert to maintenance personnel.
- maintenance personnel may perform maintenance on the aircraft consistent with the prediction. This may include maintenance personnel replacing or repairing one or more parts of the aircraft at the root cause of or otherwise implicated in the impending fault or failure.
- the input 104 is configured to access the multivariate time series (including the observations of data) from one or more sources 102 .
- the GP engine 106 is configured to iteratively perform runs of genetic programming on groups of independent variables from the plurality of independent variables. For an iteration of a plurality of iterations, this includes the population generator 108 , evolution engine 110 and ranking engine 112 .
- the population generator 108 is configured to randomly (randomly or pseudorandomly) generate a population of computer programs from a group of independent variables selected from the plurality of independent variables, and primitive functions selected from a library of primitive functions 120 , to predict the dependent variable.
- Each computer program may include one or more independent variables and one or more primitive functions.
- a primitive function is a relationship or expression that maps input data to output data.
- suitable primitive functions include mathematical operations such as addition, subtraction, multiplication, division, sine, cosine, tangent, log, exponential and the like.
- Other examples include minimum, maximum, average, standard deviation, kurtosis, skewness, variance, quantile and the like.
- the evolution engine 110 is configured to iteratively transform the population of computer programs into new generations of the population of computer programs. This includes sub-rankings of the group of independent variables based on a quantitative fitness of respective computer programs in the population of computer programs and the new generations of the population of computer programs to predict the dependent variable, with the quantitative fitness being determined according to selected fitness criterion. Examples of suitable fitness criterion include accuracy, correlation or error rate of predictions of the dependent variable from the respective computer programs relative to values of the dependent variable from the observations of data, and the like.
- the ranking engine 112 is then configured to produce a ranking of the group of independent variables from the sub-rankings of the group of independent variables.
- the evolution engine 110 is configured to execute the population of computer programs over values of the group of independent variables for each of the observations of data to produce predictions of the dependent variable.
- the evolution engine is also configured to determine the quantitative fitness of the respective computer programs in the population of computer programs according to the selected fitness criterion, and based on the predictions and the value of the dependent variable.
- the evolution engine is configured to produce a first sub-ranking of the group of independent variables based on the quantitative fitness.
- the evolution engine is configured to generate a first new generation of the population of computer programs for a second sub-iteration of the plurality of sub-iterations, from the population of computer programs, according to an evolutionary algorithm, and based on the quantitative fitness.
- the evolution engine 110 is configured to execute a new generation of the population of computer programs from a preceding sub-iteration of the plurality of sub-iterations, over values of the group of independent variables for each of the observations of data to produce predictions of the dependent variable.
- the evolution engine is configured to determine the quantitative fitness of the respective computer programs in the new generation of the population of computer programs according to the selected fitness criterion, and based on the predictions and the value of the dependent variable.
- the evolution engine is configured to produce a sub-ranking of the group of independent variables based on the quantitative fitness.
- the evolution engine is configured to generate a next new generation of the population of computer programs for a next sub-iteration of the plurality of sub-iterations, from the new generation of the population of computer programs, according to an evolutionary algorithm, and based on the quantitative fitness.
- the aggregate ranking engine 114 is configured to produce an aggregate ranking of the plurality of independent variables from the ranking of the group of independent variables over the plurality of iterations.
- the subset selection engine 116 is configured to select the subset of independent variables 122 from the aggregate ranking of the plurality of independent variables, and according to selected optimization criterion. Examples of suitable optimization criteria include correlation, accuracy, F 1 score (also referred to as F-score or F-measure), mean-square error (MSE), p-value (also referred to as probability value or asymptotic significance) and the like.
- the aggregate ranking of the plurality of independent variables includes an aggregate ranking of the plurality of sensors.
- the subset selection engine 116 being configured to select the subset of independent variables 122 includes the subset selection engine being configured to a subset of sensors from the plurality of sensors.
- the selected optimization criterion includes a number of sensors in the subset of sensors, or one or more quantitative properties that define sensors of the plurality of sensors.
- the one or more quantitative properties that define the sensors of the plurality of sensors include cost, weight, power consumption, reliability, maintainability, or complexity of installation.
- FIG. 2 is a flowchart illustrating various steps in a method 200 of selecting a subset of independent variables from a plurality of independent variables to predict a dependent variable, according to example implementations of the present disclosure.
- the method includes accessing a multivariate time series including observations of data, each of the observations of data including or indicating values of the plurality of independent variables, and a value of the dependent variable.
- the method includes iteratively performing runs of genetic programming on groups of independent variables from the plurality of independent variables.
- performing the runs of genetic programming includes randomly generating a population of computer programs from a group of independent variables selected from the plurality of independent variables, and primitive functions selected from a library of primitive functions, to predict the dependent variable, as shown at block 206 .
- the method includes iteratively transforming the population of computer programs into new generations of the population of computer programs, and including sub-rankings of the group of independent variables based on a quantitative fitness of respective computer programs in the population of computer programs and the new generations of the population of computer programs to predict the dependent variable, with the quantitative fitness being determined according to selected fitness criterion, as shown at block 208 .
- the method includes producing a ranking of the group of independent variables from the sub-rankings of the group of independent variables, as shown at block 210 .
- the method includes producing an aggregate ranking of the plurality of independent variables from the ranking of the group of independent variables over the plurality of iterations. And as shown at block 214 , the method includes selecting the subset of independent variables from the aggregate ranking of the plurality of independent variables, and according to selected optimization criterion.
- FIG. 3 is a flowchart illustrating in greater detail various steps in a method 300 of selecting a subset of sensors from a plurality of sensors to predict a dependent variable, according to example implementations.
- the method includes accessing a multivariate time series including observations of flight data for a plurality of flights of an aircraft. For each flight, the observations of data include or indicate measurements 302 of a plurality of environmental conditions measurable by the plurality of sensors, and a value of the dependent variable 304 .
- the sensors may produce associated metadata 306 (structured data that provides information about the measurements), and this metadata may indicate the plurality of sensors and/or the environmental conditions measured by the plurality of sensors.
- the method includes iteratively performing runs of genetic programming on groups of sensors from the plurality of sensors, as shown at blocks 308 - 318 . For an iteration of a plurality of iterations, this includes randomly generating a population of computer programs from a group of sensors selected from the plurality of sensors, and primitive functions selected from a library of primitive functions, to predict the dependent variable, as shown at block 308 .
- the method includes iteratively transforming the population of computer programs into new generations of the population of computer programs, as shown at blocks 310 - 316 .
- the population of computer programs is executed over measurements 302 of the group of sensors for each of the observations to produce predictions of the dependent variable 304 .
- the quantitative fitness of the respective computer programs in the population of computer programs according to selected fitness criterion, and based on the predictions and the value of the dependent variable, as shown at block 312 .
- a sub-ranking of the group of sensors is produced based on the quantitative fitness, as shown at block 314 .
- a next new generation of the population of computer programs is generated for a next sub-iteration of the plurality of sub-iterations, from the new generation of the population of computer programs, according to an evolutionary algorithm, and based on the quantitative fitness, as shown at block 316 . This may then repeat until reaching a maximum number of sub-iterations or until the quantitative fitness of the new generation converges.
- the iteration of genetic programming also includes producing a ranking 320 of the group of sensors from the sub-rankings of the group of sensors.
- This iterative run of genetic programming can repeat for a plurality of trials until reaching a maximum number, with each iteration initially differing in the randomly generated population of computer programs (block 308 ), including the selected group of sensors and primitive functions.
- the method then includes producing an aggregate ranking 322 of the plurality of sensors from the ranking of the group of sensors over the plurality of iterations, as shown at block 324 . And the method includes selecting the subset of sensors from the aggregate ranking of the plurality of sensors, and according to selected optimization criterion.
- development of a machine learning model may include selecting the subset of independent variables as a set of features for use in building the machine learning model to predict the dependent variable (e.g., condition of an aircraft).
- the machine learning model may then be built using a machine learning algorithm, the set of features, and an appropriate training set that may in some examples come from the observations of data from which the subset of independent variables is selected.
- This machine learning model may then be output for deployment to predict and thereby produce predictions of the dependent variable for additional observations of the data that exclude the value of the dependent variable.
- example implementations of the present disclosure may be applied in a number of contexts, including automated feature generation for selection of a subset of sensors of an aircraft or other vehicle or manufactured system.
- Three additional example contexts described below include (1) human-interpretable event characterization, (2) feature extractor optimization, and (3) prediction accuracy optimization.
- example implementations may be used to find behaviors in the data that characterize events of interest. For example, “for 9 out of 10 failures, the actual position of valve X is lagging the commanded position by greater than Z seconds.”
- the optimization criterion includes the number of failure cases characterized (e.g., 9 out of 10), including a penalty on the number of non-failure cases also characterized as failure cases (i.e., false positives).
- the sensors in this example include valve X's actual position and commanded position, and the primitive functions may include “lag” and “greater than” operators. In this context, one may look to examine the best-performing computer program(s) generated by the GP engine 106 .
- example implementations may be used to automatically refine an existing, manually-produced feature extractor (computer program).
- the manually-produced feature extractor may be defined as “the number of distinct times at which the actual position of valve X is lagging the commanded position of valve X by greater than 2.5 seconds.”
- the optimization criterion includes what motivated production of the manually-produced feature extractor (e.g., detect as many failures as possible).
- the sensors may include at least those sensors used in producing the manually-produced feature extractor as well as, in this example, a sensor indicating the phase of flight.
- the primitive functions may include at least the manually-produced feature extractor itself but also, in this example, a “filter” operator.
- the context here may optimize the manually-produced feature extractor into something like, “the number of distinct times at which the actual position of valve X is lagging the commanded position of valve X by greater than 1.7 seconds but only when the aircraft is in the climb or descent phase of flight.”
- example implementations have optimized the manually-produced feature by finding a better threshold (1.7, as found in the data) combined with finding when this behavior is most important (climb and descent).
- an existing feature extractor computer program
- example implementations may be used to find a set of one or more feature extractors that, when given to a machine learning algorithm, predict some outcome of interest, such as an impending component failure. This is briefly described above, although in this context, one may do more than just use the identified subset of sensors as features, and may also use the actual high-performing computer programs acting on the sensor data as features.
- a machine learning model developed according to example implementations of the present disclosure is deployed in aircraft health management software as an aircraft condition monitoring system report. Flight delays and cancellations are extremely disruptive and costly for airlines. Deployment and use of machine learning model of example implementations may trim minutes from delays or avoid cancellations by recognizing and alerting to early signs of impending failures, and may thereby significantly contribute to an airline's bottom line. The machine learning model may help to predict faults in advance and provide alerts to avoid unscheduled maintenance.
- Example implementations may be further used in design and manufacture of aircraft or other vehicle or manufactured system (generally a manufactured system).
- Example implementations may inform the design and thereby manufacture in the selected fitness criterion. For example, if example implementations find that n ⁇ m sensors adequately perform the same function as that of an original set of n sensors, then cost/weight/complexity of installation/etc. can be optimized in the design of the manufactured system. Or as a slight variation, example implementations can find that a set of sensors A can perform the same function as a set of sensors B (even if
- Example implementations may inform the design and thereby manufacture in the design of a component itself (and thereby subsequent manufacture of the component). This is related to the above-describe human-interpretable event characterization.
- human-interpretable event characterization finds a large portion of the failure modes, such that actual in-service behavior deviates from the expectations in the appropriate failure modes and effects analysis (FMEA).
- Example implementations may inform an engineer to re-design a component such that it matches realities in the field.
- this valve has been re-designed to make it more robust to altitude changes (climb and descent) and therefore more reliable with respect to this specific failure mode, which has occurred more often than previously anticipated.” Without knowing the conditions and behaviors that characterized failures in the field, the engineer has less direction for coming up with a re-design that has the highest chance of increasing the reliability of the component.
- the system 100 and its subsystems including the source 102 , input 104 , GP engine 106 , aggregate ranking engine 114 and a subset selection engine 116 may be implemented by various means.
- Means for implementing the system and its subsystems may include hardware, alone or under direction of one or more computer programs from a computer-readable storage medium.
- one or more apparatuses may be configured to function as or otherwise implement the system and its subsystems shown and described herein.
- the respective apparatuses may be connected to or otherwise in communication with one another in a number of different manners, such as directly or indirectly via a wired or wireless network or the like.
- FIG. 4 illustrates an apparatus 400 according to some example implementations of the present disclosure.
- an apparatus of exemplary implementations of the present disclosure may comprise, include or be embodied in one or more fixed or portable electronic devices. Examples of suitable electronic devices include a smartphone, tablet computer, laptop computer, desktop computer, workstation computer, server computer or the like.
- the apparatus may include one or more of each of a number of components such as, for example, processing circuitry 402 (e.g., processor unit) connected to a memory 404 (e.g., storage device).
- processing circuitry 402 e.g., processor unit
- memory 404 e.g., storage device
- the processing circuitry 402 may be composed of one or more processors alone or in combination with one or more memories.
- the processing circuitry is generally any piece of computer hardware that is capable of processing information such as, for example, data, computer programs and/or other suitable electronic information.
- the processing circuitry is composed of a collection of electronic circuits some of which may be packaged as an integrated circuit or multiple interconnected integrated circuits (an integrated circuit at times more commonly referred to as a “chip”).
- the processing circuitry may be configured to execute computer programs, which may be stored onboard the processing circuitry or otherwise stored in the memory 404 (of the same or another apparatus).
- the processing circuitry 402 may be a number of processors, a multi-core processor or some other type of processor, depending on the particular implementation. Further, the processing circuitry may be implemented using a number of heterogeneous processor systems in which a main processor is present with one or more secondary processors on a single chip. As another illustrative example, the processing circuitry may be a symmetric multi-processor system containing multiple processors of the same type. In yet another example, the processing circuitry may be embodied as or otherwise include one or more ASICs, FPGAs or the like. Thus, although the processing circuitry may be capable of executing a computer program to perform one or more functions, the processing circuitry of various examples may be capable of performing one or more functions without the aid of a computer program. In either instance, the processing circuitry may be appropriately programmed to perform functions or operations according to example implementations of the present disclosure.
- the memory 404 is generally any piece of computer hardware that is capable of storing information such as, for example, data, computer programs (e.g., computer-readable program code 406 ) and/or other suitable information either on a temporary basis and/or a permanent basis.
- the memory may include volatile and/or non-volatile memory, and may be fixed or removable. Examples of suitable memory include random access memory (RAM), read-only memory (ROM), a hard drive, a flash memory, a thumb drive, a removable computer diskette, an optical disk, a magnetic tape or some combination of the above.
- Optical disks may include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W), DVD or the like.
- the memory may be referred to as a computer-readable storage medium.
- the computer-readable storage medium is a non-transitory device capable of storing information, and is distinguishable from computer-readable transmission media such as electronic transitory signals capable of carrying information from one location to another.
- Computer-readable medium as described herein may generally refer to a computer-readable storage medium or computer-readable transmission medium.
- the processing circuitry 402 may also be connected to one or more interfaces for displaying, transmitting and/or receiving information.
- the interfaces may include a communications interface 408 (e.g., communications unit) and/or one or more user interfaces.
- the communications interface may be configured to transmit and/or receive information, such as to and/or from other apparatus(es), network(s) or the like.
- the communications interface may be configured to transmit and/or receive information by physical (wired) and/or wireless communications links. Examples of suitable communication interfaces include a network interface controller (NIC), wireless NIC (WNIC) or the like.
- NIC network interface controller
- WNIC wireless NIC
- the user interfaces may include a display 410 and/or one or more user input interfaces 412 (e.g., input/output unit).
- the display may be configured to present or otherwise display information to a user, suitable examples of which include a liquid crystal display (LCD), light-emitting diode display (LED), plasma display panel (PDP) or the like.
- the user input interfaces may be wired or wireless, and may be configured to receive information from a user into the apparatus, such as for processing, storage and/or display. Suitable examples of user input interfaces include a microphone, image or video capture device, keyboard or keypad, joystick, touch-sensitive surface (separate from or integrated into a touchscreen), biometric sensor or the like.
- the user interfaces may further include one or more interfaces for communicating with peripherals such as printers, scanners or the like.
- program code instructions may be stored in memory, and executed by processing circuitry that is thereby programmed, to implement functions of the systems, subsystems, tools and their respective elements described herein.
- any suitable program code instructions may be loaded onto a computer or other programmable apparatus from a computer-readable storage medium to produce a particular machine, such that the particular machine becomes a means for implementing the functions specified herein.
- These program code instructions may also be stored in a computer-readable storage medium that can direct a computer, a processing circuitry or other programmable apparatus to function in a particular manner to thereby generate a particular machine or particular article of manufacture.
- the instructions stored in the computer-readable storage medium may produce an article of manufacture, where the article of manufacture becomes a means for implementing functions described herein.
- the program code instructions may be retrieved from a computer-readable storage medium and loaded into a computer, processing circuitry or other programmable apparatus to configure the computer, processing circuitry or other programmable apparatus to execute operations to be performed on or by the computer, processing circuitry or other programmable apparatus.
- Retrieval, loading and execution of the program code instructions may be performed sequentially such that one instruction is retrieved, loaded and executed at a time. In some example implementations, retrieval, loading and/or execution may be performed in parallel such that multiple instructions are retrieved, loaded, and/or executed together. Execution of the program code instructions may produce a computer-implemented process such that the instructions executed by the computer, processing circuitry or other programmable apparatus provide operations for implementing functions described herein.
- an apparatus 400 may include a processing circuitry 402 and a computer-readable storage medium or memory 404 coupled to the processing circuitry, where the processing circuitry is configured to execute computer-readable program code 406 stored in the memory. It will also be understood that one or more functions, and combinations of functions, may be implemented by special purpose hardware-based computer systems and/or processing circuitry which perform the specified functions, or combinations of special purpose hardware and program code instructions.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Aviation & Aerospace Engineering (AREA)
- Computational Linguistics (AREA)
- Transportation (AREA)
- Manufacturing & Machinery (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physiology (AREA)
- Genetics & Genomics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (27)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/209,486 US12327195B2 (en) | 2018-12-04 | 2018-12-04 | Automated feature generation for sensor subset selection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/209,486 US12327195B2 (en) | 2018-12-04 | 2018-12-04 | Automated feature generation for sensor subset selection |
Publications (2)
Publication Number | Publication Date |
---|---|
US20200175380A1 US20200175380A1 (en) | 2020-06-04 |
US12327195B2 true US12327195B2 (en) | 2025-06-10 |
Family
ID=70850176
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/209,486 Active 2041-12-18 US12327195B2 (en) | 2018-12-04 | 2018-12-04 | Automated feature generation for sensor subset selection |
Country Status (1)
Country | Link |
---|---|
US (1) | US12327195B2 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11714913B2 (en) * | 2018-10-09 | 2023-08-01 | Visa International Service Association | System for designing and validating fine grained fraud detection rules |
US11200461B2 (en) * | 2018-12-21 | 2021-12-14 | Capital One Services, Llc | Methods and arrangements to identify feature contributions to erroneous predictions |
WO2023121777A1 (en) * | 2021-12-23 | 2023-06-29 | Virtualitics, Inc. | Software engines configured for detection of high impact scenarios with machine learning-based simulation |
US20230236589A1 (en) * | 2022-01-27 | 2023-07-27 | Hitachi, Ltd. | Optimizing execution of multiple machine learning models over a single edge device |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5727128A (en) * | 1996-05-08 | 1998-03-10 | Fisher-Rosemount Systems, Inc. | System and method for automatically determining a set of variables for use in creating a process model |
US6272479B1 (en) * | 1997-07-21 | 2001-08-07 | Kristin Ann Farry | Method of evolving classifier programs for signal processing and control |
US20020152037A1 (en) * | 1999-06-17 | 2002-10-17 | Cyrano Sciences, Inc. | Multiple sensing system and device |
US20060126608A1 (en) * | 2004-11-05 | 2006-06-15 | Honeywell International Inc. | Method and apparatus for system monitoring and maintenance |
US20080154811A1 (en) * | 2006-12-21 | 2008-06-26 | Caterpillar Inc. | Method and system for verifying virtual sensors |
US20080177681A1 (en) * | 2007-01-22 | 2008-07-24 | Rosario Helen Geraldine E | Data spiders |
US20080183444A1 (en) * | 2007-01-26 | 2008-07-31 | Grichnik Anthony J | Modeling and monitoring method and system |
US7437335B2 (en) * | 2004-12-07 | 2008-10-14 | Eric Baum | Method and system for constructing cognitive programs |
US20090216393A1 (en) * | 2008-02-27 | 2009-08-27 | James Schimert | Data-driven anomaly detection to anticipate flight deck effects |
US20100049340A1 (en) * | 2007-03-19 | 2010-02-25 | Dow Global Technologies Inc. | Inferential Sensors Developed Using Three-Dimensional Pareto-Front Genetic Programming |
US20110066579A1 (en) * | 2009-09-16 | 2011-03-17 | Oki Electric Industry Co., Ltd. | Neural network system for time series data prediction |
US20120005149A1 (en) * | 2010-06-30 | 2012-01-05 | Raytheon Company | Evidential reasoning to enhance feature-aided tracking |
US20150302163A1 (en) * | 2014-04-17 | 2015-10-22 | Lockheed Martin Corporation | Prognostics and health management system |
US20170166328A1 (en) * | 2015-12-11 | 2017-06-15 | The Boeing Company | Predictive aircraft maintenance systems and methods incorporating classifier ensembles |
US20180077034A1 (en) * | 2016-09-13 | 2018-03-15 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Data acquisition method and apparatus for driverless vehicle |
US20180079532A1 (en) * | 2016-09-19 | 2018-03-22 | Simmonds Precision Products, Inc. | Automated structural interrogation of aircraft components |
US20180288080A1 (en) * | 2017-03-31 | 2018-10-04 | The Boeing Company | On-board networked anomaly detection (onad) modules |
US20190130769A1 (en) * | 2017-10-27 | 2019-05-02 | International Business Machines Corporation | Real-time identification and provision of preferred flight parameters |
US11170295B1 (en) * | 2016-09-19 | 2021-11-09 | Tidyware, LLC | Systems and methods for training a personalized machine learning model for fall detection |
-
2018
- 2018-12-04 US US16/209,486 patent/US12327195B2/en active Active
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5727128A (en) * | 1996-05-08 | 1998-03-10 | Fisher-Rosemount Systems, Inc. | System and method for automatically determining a set of variables for use in creating a process model |
US6272479B1 (en) * | 1997-07-21 | 2001-08-07 | Kristin Ann Farry | Method of evolving classifier programs for signal processing and control |
US20020152037A1 (en) * | 1999-06-17 | 2002-10-17 | Cyrano Sciences, Inc. | Multiple sensing system and device |
US20060126608A1 (en) * | 2004-11-05 | 2006-06-15 | Honeywell International Inc. | Method and apparatus for system monitoring and maintenance |
US7437335B2 (en) * | 2004-12-07 | 2008-10-14 | Eric Baum | Method and system for constructing cognitive programs |
US20080154811A1 (en) * | 2006-12-21 | 2008-06-26 | Caterpillar Inc. | Method and system for verifying virtual sensors |
US20080177681A1 (en) * | 2007-01-22 | 2008-07-24 | Rosario Helen Geraldine E | Data spiders |
US20080183444A1 (en) * | 2007-01-26 | 2008-07-31 | Grichnik Anthony J | Modeling and monitoring method and system |
US20100049340A1 (en) * | 2007-03-19 | 2010-02-25 | Dow Global Technologies Inc. | Inferential Sensors Developed Using Three-Dimensional Pareto-Front Genetic Programming |
US20090216393A1 (en) * | 2008-02-27 | 2009-08-27 | James Schimert | Data-driven anomaly detection to anticipate flight deck effects |
US20110066579A1 (en) * | 2009-09-16 | 2011-03-17 | Oki Electric Industry Co., Ltd. | Neural network system for time series data prediction |
US20120005149A1 (en) * | 2010-06-30 | 2012-01-05 | Raytheon Company | Evidential reasoning to enhance feature-aided tracking |
US20150302163A1 (en) * | 2014-04-17 | 2015-10-22 | Lockheed Martin Corporation | Prognostics and health management system |
US20170166328A1 (en) * | 2015-12-11 | 2017-06-15 | The Boeing Company | Predictive aircraft maintenance systems and methods incorporating classifier ensembles |
US20180077034A1 (en) * | 2016-09-13 | 2018-03-15 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Data acquisition method and apparatus for driverless vehicle |
US20180079532A1 (en) * | 2016-09-19 | 2018-03-22 | Simmonds Precision Products, Inc. | Automated structural interrogation of aircraft components |
US11170295B1 (en) * | 2016-09-19 | 2021-11-09 | Tidyware, LLC | Systems and methods for training a personalized machine learning model for fall detection |
US20180288080A1 (en) * | 2017-03-31 | 2018-10-04 | The Boeing Company | On-board networked anomaly detection (onad) modules |
US20190130769A1 (en) * | 2017-10-27 | 2019-05-02 | International Business Machines Corporation | Real-time identification and provision of preferred flight parameters |
Non-Patent Citations (6)
Title |
---|
Das, A., "Algorithms for Subset Selection in Linear Regression", Symposium on Theory of Computing, ACM, Victoria, British Columbia, Canada, May 2008, pp. 1-10. |
Luke, S., "ECJ Then and Now", George Mason University, GECCO '17 Companion, Jul. 2017, Berlin, Germany, pp. 1-8. |
Mierswa, I. et al., "Automatic Feature Extraction for Classifying Audio Data", Jul. 2004, University of Dortmund, Germany, pp. 1-28. |
Montavon, G. et al., "Explaining NonLinear Classification Decisions with Deep Taylor Decomposition", Pattern Recognition 65, Jan. 2017, pp. 1-14. |
Westbury, C., Buchanan, L., Sanderson, M., Rhemtulla, M. and Phillips, L., 2003. Using genetic programming to discover nonlinear variable interactions. Behavior Research Methods, Instruments, & Computers, 35(2), pp. 202-216. (Year: 2003). * |
Wikipedia, "Genetic Programming", Oct. 2018, pp. 1-6, retrieved from https://en.wikipedia.org/w/index.pho?title=Genetic_programming&oldid=862701469. |
Also Published As
Publication number | Publication date |
---|---|
US20200175380A1 (en) | 2020-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11761792B2 (en) | Machine learning model development with interactive model evaluation | |
US11367016B2 (en) | Machine learning model development with interactive model building | |
US12327195B2 (en) | Automated feature generation for sensor subset selection | |
US11858651B2 (en) | Machine learning model development with interactive feature construction and selection | |
US11544493B2 (en) | Machine learning model development with interactive exploratory data analysis | |
US11501103B2 (en) | Interactive machine learning model development | |
EP3835904B1 (en) | Closed-loop diagnostic model maturation for complex systems | |
US20220121988A1 (en) | Computing Platform to Architect a Machine Learning Pipeline | |
Stoyanov et al. | Predictive analytics methodology for smart qualification testing of electronic components | |
CN111209153B (en) | Abnormity detection processing method and device and electronic equipment | |
CN112148766A (en) | Method and system for sampling data using artificial neural network model | |
Detzner et al. | Feature selection methods for root‐cause analysis among top‐level product attributes | |
CN114104329A (en) | Automated prediction of fixes based on sensor data | |
CN115427986A (en) | Algorithm learning engine for dynamically generating predictive analytics from high-volume, high-speed streaming data | |
US8359577B2 (en) | Software health management testbed | |
EP3839727A1 (en) | Multi-model code management | |
Burnaev | Rare failure prediction via event matching for aerospace applications | |
US10964132B2 (en) | Detecting fault states of an aircraft | |
GB2590414A (en) | Anomaly detection for code management | |
US20240354242A1 (en) | Method and system for testing functionality of a software program using digital twin | |
EP3839728A1 (en) | Logistic model for code management | |
Soni et al. | Predictive maintenance of gas turbine using prognosis approach | |
GB2590416A (en) | Logistic model for code management | |
GB2590415A (en) | Software code management | |
Barbosa et al. | Using Federated Machine Learning in Predictive Maintenance of Jet Engines |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |