Contrast:High|Normal
In multi-label learning a single object can be associated with multiple labels simultaneously. In a context where labels follow a random distribution, every labelling has a probability of occurrence. Thus, any prediction is associated with an expected error measured by a predefined loss function. From an exponential number of possible labellings, an algorithm should choose the prediction that minimizes the expected error. This is known as loss minimization. This work shows a proof of the NP-completeness, with respect to the number of labels, of a specific case of the loss minimization of the Coverage loss function, which allows to conclude that the general case is NP-hard.
Billing errors increase the costs of power companies and lower their reliability as perceived by customers. The majority of these errors are due to wrong readings that occur when employees of power companies visit the customers to read electrical meters and issue the bills. To prevent such errors, prediction techniques calculate a predicted value for each customer based on the values of their previous readings, plus a tolerance around this value, sending bills to be inspected by analysts if the reading extrapolates the established range. However, such analysis increases the personnel cost of the power company. In addition, wrongly printed bills lead to possible lawsuits and fines that might also affect the costs and reliability of the power company. The main focus of this work is to minimize personnel cost by reducing the number of correct readings sent to unnecessary analysis, while protecting the power company credibility by not increasing the number of bills with wrong values sent to clients in the process. The proposed solution uses Empirical Bayes methods along with a method to consider seasonal behavior of customers. The methodology was applied to a dataset comprising 35,704,489 measurements from 1,330,989 different customers of a Brazilian power company. The results show that the new methodology was able to decrease the number of correct bills sent to analysis without lowering the reputation of the company.
A machine learning approach to perform automatic detection and diagnosis of faults of electrical submersible pump systems is presented. Several thousand vibration patterns were acquired from vertically distributed accelerometers along the string of motors, pumps and protectors. Intermediate features are extracted from the raw vibration signals originating from the set of accelerometers. Each pattern was labelled by a human expert to provide ground truth with respect to the different operation classes (normal, sensor fault, rubbing, unbalance or misalignment). A software framework is used to compare several classifier architectures (K-Nearest-Neighbor, Random Forest, Support Vector Machine, Naïve Bayes and Decision Trees) in a bias aware performance evaluation. In order to boost the classification performance, an ensemble of different versions of a classifier architecture is constructed using the Decision Templates fusion function. The robustness of the system with respect to the emergence of new faults (i.e., untreated faults so far) is corroborated by a systematic analysis methodology.
DBSCAN is a classic clustering method for identifying clusters of different shapes and isolate noisy patterns. Despite these qualities, many articles in the literature address the scalability problem of DBSCAN. This work presents two methods to generate a good sample for the DBSCAN algorithm. The execution time decreases due to the reduction in the number of patterns presented to DBSCAN. One method is an improvement of the Rough-DBSCAN and presented consistently better results. The second is a new heuristic called I-DBSCAN capable of adapting and generating good results for all datasets without the need of any additional parameter.
This work presents the concept of Cascade Feature Selection to combine feature selection methods. Fast and weak methods, like ranking, are placed on the top of the cascade to reduce the dimensionality of the initial feature set. Thus, strong and computationally demanding methods, placed on the bottom of the cascade, have to deal with less features. Three cascade combinations are tested with the Extreme Learning Machine as the underlying classification architecture. The Tennessee Eastman chemical process simulation software and one high-dimensional data set are used as sources of the benchmark data. Experimental results suggest that the cascade arrangement can produce smaller final feature subsets, expending less time, with higher classification performances than a feature selection based on a Genetic Algorithm. Many works in the literature have proposed mixed methods with specific combination strategies. The main contribution of this work is a concept able to combine any existent method using a single strategy. Provided that the Cascade Feature Selection requirements are fulfilled, the combinations might reduce the time to select features or increase the classification performance of the classifiers trained with the selected features.
Distinct feature extraction methods are simultaneously used to describe bearing faults. This approach produces a large number of heterogeneous features that augment discriminative information but, at the same time, create irrelevant and redundant information. A subsequent feature selection phase filters out the most discriminative features. The feature models are based on the complex envelope spectrum, statistical time- and frequency-domain parameters, and wavelet packet analysis. Feature selection is achieved by conventional search of the feature space by greedy methods. For the final fault diagnosis, the k-nearest neighbor classifier, feedforward net, and support vector machine are used. Performance criteria are the estimated error rate and the area under the receiver operating characteristic curve (AUC-ROC). Experimental results are shown for the Case Western Reserve University Bearing Data. The main contribution of this paper is the strategy to use several different feature models in a single pool, together with feature selection to optimize the fault diagnosis system. Moreover, robust performance estimation techniques usually not encountered in the context of engineering are employed.
We present a system for automatic model-free fault detection based on a feature set from vibrational patterns. The complexity of the feature model is reduced by feature selection. We use a wrapper approach for the selection criteria, incorporating the training of an artificial neural network into the selection process. For fast convergence we train with the Levenberg-Marquardt algorithm. Experiments are presented for eight different fault classes.
Non-technical energy losses mostly arise from illegal use of energy and force energy distribution companies to inspect large batches of clients in order to make decisions on actions for reducing these losses. Since an exhaustive inspection is impractical due to the high inspection cost and the very large number of clients, a carefully designed sampling procedure is needed. A useful strategy is offered by stratified sampling based on a division of the clients into homogeneous subgroups (strata). In this work we formulate the stratification task as a non-linear restricted optimization problem, in which the variance of overall energy loss due to the fraudulent activities is minimized. Solving this problem analytically is difficult and an exhaustive algorithm is intractable even for small problem instances. Therefore, we propose a Genetic Algorithm for finding practical solutions for the problem. Numerical experiments and a comparison with Simulated Annealing algorithm and a proportional allocation scheme are presented.
We perform an empirical performance analysis of the Multilayer Perceptron applied to the fault diagnosis of motor pumps installed on oil rigs. The conventional Multilayer Perceptron architecture is compared to a recently developed enhancement of this general purpose regression/classification paradigm, using an intermediate opaque layer which maps the original patterns to a reproducing kernel Hilbert space prior to learning the usual functional mapping of the network. State of the art statistical tools are used to corroborate our hypotheses that the kernel enhanced version improves the classification performance.
The objective of this work is the model-free diagnosis of faults of motor pumps installed on oil rigs by sophisticated kernel classifier ensembles. Signal processing of vibrational patterns delivers the features. Different kernel-based classifiers are combined in ensembles to optimize accuracy and increase robustness. A comparative study of various classification paradigms, all performing implicit nonlinear pattern mapping by kernels is done. We employ support vector machines, kernel nearest neighbor, Bayesian Quadratic Gaussian classifiers with kernels, and linear machines with kernels.
We present a generic procedure for diagnosing faults using features extracted from noninvasive machine signals, based on supervised learning techniques to build the fault classifiers. An important novelty of our research is the use of 2000 examples of vibration signals obtained from operating faulty motor pumps, acquired from 25 oil platforms off the Brazilian coast during five years. Several faults can simultaneously occur in a motor pump. Each fault is individually detected in an input pattern by using a distinct ensemble of support vector machine (SVM) classifiers. We propose a novel method for building a SVM ensemble, based on using hill-climbing feature selection to create a set of accurate, diverse feature subsets, and further using a grid-search parameter tuning technique to vary the parameters of SVMs aiming to increase their individual accuracy. Thus our ensemble composing method is based on the hybridization of two distinct, simple techniques originally designed for producing accurate single SVMs. The experiments show that this proposed method achieved a higher estimated prediction accuracy in comparison to using a single SVM classifier or using the well-established genetic ensemble feature selection (GEFS) method for building SVM ensembles.
This paper presents the results achieved by fault classifier ensembles based on a model-free supervised learning approach for diagnosing faults on oil rigs motor pumps. The main goal is to compare two feature-based ensemble construction methods, and present a third variation from one of them. The use of ensembles instead of single classifier systems has been widely applied in classification problems lately. The diversification of classifiers performed by the methods presented in this work is obtained by varying the feature set each classifier uses, and also at one point, alternating the intrinsic parameters for the training algorithm. We show results obtained with the established genetic algorithm GEFS and our recently developed approach called BSFS, which has a lower computational cost. We rely on a database of real data, with 2000 acquisitions of vibration signals extracted from operational motor pumps. Our results compare the outcomes from the two methods mentioned, and present a modification in one of them that improved the accuracy, reinforcing the motivation for the usage of that method.
We present a supervised learning classification method for model-free fault detection and diagnosis, aiming to improve the maintenance quality of motor pumps installed on oil rigs. We investigate our generic fault diagnosis method on 2000 examples of real-world vibrational signals obtained from operational faulty industrial machines. The diagnostic system detects each considered fault in an input pattern using an ensemble of classifiers, which is composed of accurate classifiers that differ on their predictions as much as possible. The ensemble is built by first using complementary feature selection techniques to produce a set of candidate classifiers, and finally selecting an optimized subset of them to compose the ensemble. We propose a novel ensemble creation method based on feature selection. We work with Support Vector Machine (SVM) classifiers. As the performance of a SVM strictly depends on its hyperparameters, we also study whether and how varying the SVM hyperparameters might increase the ensemble accuracy. Our experiments show the usefulness of appropriately tuning the SVM hyperparameters in order to increase the ensemble diversity and accuracy.
This paper presents the results achieved by fault classifier ensembles based on supervised learning for diagnosing faults on oil rigs motor pumps. The main goal is to apply two feature-based ensemble construction methods to a real-world problem. Recent studies have shown that the use of ensembles of classifiers that are accurate and at the same time have diversifying results can improve the final classification accuracy, compared to a single accurate classifier. The diversification performed by the methods presented in this work is obtained by varying the feature set each classifier uses. We show results obtained with the established genetic algorithm GEFS and a recently developed approach called BSFS, which has lower computational cost. We rely on a database of real data, with 2000 acquisitions of vibration signals extracted from operational motor pumps. Our results show that the ensemble methods had a higher classification accuracy solving a real-world fault diagnosis problem than single classifiers. Also, we present promising results in our experiments with both algorithms, that successfully solves the problem.
We report about fault diagnosis experiments to improve the maintenance quality of motor pumps installed on oil rigs. Our work is motivated by the diversity of the studied defects and the availability of real data from operational oil rigs. In this work we present a fault diagnosis system that is better suited to overcome the difficulties that arise from real-world fault diagnosis, for instance the occurrence of multiple coexistent defects. Each fault is predicted by a distinct ensemble of Support Vector Machine (SVM) classifiers which differ among themselves on the feature set they use as well as on their intrinsic parameters. In order to build the ensemble we apply a novel approach based on the outputs of several stepwise wrapper feature selection methods. Our method requires a minimum of a priori knowledge about the plant because the faults predictor is automatically defined based on training data, allowing the method to be easily extended to many equipments, sensors, and failures types.
The support vector machine (SVM) classifier is currently one of the most powerful techniques for solving binary classification problems. To further increase the accuracy of an individual SVM we use an ensemble of SVMs, composed of classifiers that are as accurate and divergent as possible. We investigate the usefulness of SVM ensembles in which the classifiers differ among themselves in both the feature set and the SVM parameter value they use, which might increase the diversity among the classifiers and therefore the ensemble accuracy. We propose a novel ensemble creation method aiming to create an optimized ensemble. First we perform complementary feature selection methods to generate a set of feature subsets, and then for each feature subset we build a SVM classifier which uses tuned SVM parameters. Our experiments show that this method achieved a higher estimated prediction accuracy in comparison to other approaches for creating SVM ensembles. We work in a context of real-world industrial machine fault diagnosis, using 2000 examples of vibrational signals obtained from operational faulty motor pumps installed in oil platforms.
This work proposes a methodology to improve the calculation of technical losses in power distribution networks using estimates of non-technical losses. The estimation procedure of non-technical losses is based on the history of energy consumption of consumers and on the results of field inspections. As motivation, we have the fact that the main methodologies to compute technical losses proposed in the literature do not consider the influence of non-technical losses, and hence are not very precise. The methodology here developed is very flexible, robust and can be applied even when the irregularities history maintained by the utility company is small. The clustering and sampling techniques used allow an estimation of technical losses closer to the real one.
We present a collection of pattern recognition techniques applied to fault detection and diagnosis of motor pumps. Vibrational patterns are the basis for describing the condition of the process. We rely on the data-driven approach to the learning of the fault classes, i.e. supervised learning in pattern recognition. Our work is motivated by the diversity of the studied defects, the availability of real data from operational oil rigs, and the use of statistical pattern recognition techniques usually not explored sufficiently in similar works. We show the results of automatic methods to define, select and combine features that describe the process and to classify the faults on the provided examples. The support vector machine is chosen as the classification architecture.
We report about fault diagnosis experiments to improve the maintenance quality of motor pumps installed on oil rigs. We rely on the data-driven approach to the learning of the fault classes, i.e. supervised learning in pattern recognition. Features are extracted from the vibration signals to detect and diagnose misalignment and mechanical looseness problems. We show the results of automatic pattern recognition methods to define and select features that describe the faults of the provided examples. The support vector machine is chosen as the classification architecture.
The classification problem is recurrent in the context of supervised learning. A classification problem is a class of computational task in which labels must be assigned to object instances using information acquired from labeled instances of the same type of objects. When these objects contain time sensitive data, special classification methods could be used to take ad- vantage of the inherent extra information. As far as this paper is concerned, the time sensitive data are sequences of values that represent the measured energy consumption of residential clients in a given month. Traditional classifiers do not take temporal features into account, interpreting them as a series of unrelated static information. The proposed method is to develop methods of classification to be applied in a real time-series problem that somehow consider the time series as being the same value being repeatedly measured. Two new approaches are suggested to deal with this problem: the first is a Hybrid classifier that uses clustering, DTW (Dynamic Time Warp) and Euclidean distance to label a given instance. The second is a Weighted Curve Comparison Algorithm that creates consumption profiles and compares them with the unknown instance to classify it.
This paper presents vibration analysis techniques for fault detection in rotating machines. Rolling element bearing defects inside a motor pump are the subject of study. Signal processing techniques, like frequency filters, Hilbert transform, and spectral analysis are used to extract features used later as a base to classify the condition of machines. Also, pattern recognition techniques are applied to the obtained features to improve the classification precision. In a previous work, a graphic simulation was used to produce signals to illustrate the idea of the method. In this work we examine the performance of this method for monitoring bearing condition when applied to rotating machines of oil rigs, that is, when applied to real problems.
In this paper, we have evaluated some techniques for the time series classification problem. Many distance measures have been proposed as an alternative to the Euclidean Distance in the Nearest Neighbor Classifier. To verify the assumption that the combination of various similarity measures may produce a more accurate classifier, we have proposed an algorithm to combine several measures based on weights. We have carried out a set of experiments to verify the hypothesis that the new algorithm is better than the classical ones. Our results show an improvement over the well-established Nearest-Neighbor with DTW (Dynamic Time Warping), but in general, they were obtained combining few measures in each problem used in the experimental evaluation.
This paper presents vibration analysis techniques for fault detection in rotating machines. Rolling-element bearing defects inside a motor pump are the object of study. A dynamic model of the faults usually found in this context is presented. Initially a graphic simulation is used to produce the signals. Signal processing techniques, like frequency filters, Hilbert transform and spectral analysis are then used to extract features that will later be used as a base to classify the states of the studied process. After that real data from a centrifugal pump is submitted to the developed methods.
Clustering is defined as the task of dividing a data set such that elements within each subset are similar between themselves and are dissimilar to elements belonging to other subsets. This problem can be understood as an optimization problem that looks for the best configuration of the clusters among all possible configurations. K-means is the most popular approximate algorithm applied to the clustering problem, but it is very sensitive to the start solution and can get stuck in local optima. Metaheuristics can also be used to solve the problem. Nevertheless, the direct application of metaheuristics to the clustering problem seems to be effective only on small data sets. This work suggests the use of methods for finding initial solutions to the K-means algorithm in order to initialize Simulated Annealing and search solutions near the global optima.
Neste trabalho são apresentadas técnicas de análise de sinais de vibração para detectar falhas em máquinas rotativas. Defeitos em rolamentos dentro de uma motobomba são o objeto de estudo. Define-se um modelo dinâmico das falhas tipicamente encontradas nesse contexto. Inicialmente uma simulação gráfica é usada para produzir os sinais. Técnicas de processamento de sinais, como filtros de frequências, transformada de Hilbert e análise espectral, são usadas para extrair características que posteriormente servem como base para classificar os estados do processo estudado. Em seguida dados reais de uma bomba centrifugal são submetidos aos métodos desenvolvidos.
O serviço de distribuição da energia elétrica até os consumidores é prestado pelas concessionárias e analisado pela ANEEL, segundo o aspecto de confiabilidade no fornecimento, através dos indicadores de continuidade. Uma concessionária busca melhorar seus indicadores instalando dispositivos de proteção, tais como religadores e fusíveis, e de seccionamento, como chaves, ao longo da rede de distribuição. Boa parte da literatura trata apenas da alocação de religadores em alimentadores. Este artigo propõe dois métodos para otimizar a alocação de religadores em subestações, que são formadas por vários alimentadores, trabalhando em problemas um nível acima do usual.
Uma empresa de distribuição de energia elétrica deve prestar aos consumidores um serviço confiável, sendo a qualidade deste quantificada por indicadores de continuidade definidos pela ANEEL (Agência Nacional de Energia Elétrica). Um ponto chave para melhoria dos indicadores é a boa alocação de equipamentos de proteção e seccionamento (religadores, fusíveis e chaves) na rede, dentre os quais, os religadores possuem um papel mais relevante e também um custo mais elevado. Na literatura, trata-se basicamente da alocação de equipamentos em alimentadores. Este artigo apresenta dois métodos para tratar o problema de otimização da alocação de religadores em redes de distribuição de larga escala, mais especificamente, em subestaç˜oes (que são compostas por alimentadores).
Uma empresa responsável pelo fornecimento de energia elétrica (concessionária) busca prestar um serviço de boa qualidade, mantendo um baixo custo operacional. Para quantificar, analisar e regular o desempenho das concessionárias, a ANEEL define os indicadores de continuidade para redes de distribuição e estabelece metas a serem cumpridas para estes indicadores. Neste intuito, as concessionárias devem instalar equipamentos de proteção e de seccionamento em locais adequados da rede de distribuição, de forma aumentar a sua confiabilidade. Este trabalho propõe um modelo de Programação Não-Linear Binária com o objetivo de melhorar os indicadores de continuidade de uma rede de distribuição, identificando o tipo e a localização dos dispositivos que devem ser instalados ao longo da rede. O novo modelo é mais geral que os apresentados na literatura, tratando-os como um caso particular e permitindo encontrar soluções de melhor qualidade.
As empresas responsáveis pelo fornecimento de energia elétrica (concessionárias) devem instalar equipamentos de proteção e de seccionamento em locais adequados da rede de distribuição para prestar um serviço de boa qualidade. Orgãos reguladores estabelecem métricas (indicadores de continuidade) para quantificar, analisar e regular o desempenho das concessionárias. Este trabalho propõe um modelo de Programação Não-Linear Binária mais geral que os já existentes para identificar o tipo e a localização de equipamentos de proteção em um alimentador, de forma a minimizar os seus indicadores. Uma estimativa da quantidade de energia elétrica economizada com a diminuição dos indicadores é mostrada.
Neste trabalho, avaliamos alternativas para o problema de classificaçã de séries temporais. Com o objetivo de aumentar a precisão do classificador do vizinho mais próximo, diversas métricas têm sido propostas como opçãoa distância Euclidiana. Avaliamos algumas opções e, baseado no teste de Wilcoxon para dados pareados, produzimos uma relação daquelas em que há evidências de melhoria na precisão do classificador.
Um dos maiores problemas enfrentados pelas empresas de distribuição de energia elétrica no Brasil é o furto de energia. O presente trabalho mostra o uso de técnicas de inteligência computacional para detectar possíveis ocorrências de uso ilícito de energia e instalações irregulares. O objetivo é aumentar as chances de sucesso nas inspeções em campo realizadas pelas empresas. Foram utilizadas técnicas de mineração de dados e sistemas baseados em conhecimento. Além de descrever como essas técnicas foram empregadas, esse artigo apresenta os três produtos de software desenvolvidos para selecionar consumidores para inspeção e os resultados experimentais alcançados
The task of fraud detection was always an issue for Credit Cards and Telecom Companies. The issue became also a serious problem in Brazil for Power Distribution Companies. Detecting fraud is not a simple task and requires a careful exam on a huge amount of data collected through out years of operation. The numbers involving fraud are always impressive and we can easily expect a resource recover that goes around the millions of dollars. This paper aims to show a business case that combines the use of GIS and AI systems to recover funds lost. Our company developed during 2003/2004 a R&D project to help on fraud detection and energy robbery refrain and recover. Behind this effort, the knowledge of the experts was blended into computing algorithms using Data Mining techniques to unveil the customers who were deliberately causing losses to the company. The use of Neural Networks, Bayesian Networks, decision trees and knowledge-based systems were among the techniques used. Results were evaluated and we were able to enhance the software. We are managing to use Brazilians 2000 Census information in our GIS database to help the Data Mining algorithms on tracking down the cases. The process involved in this effort is quite complex and the presentation will share the major issues and obstacles faced by the project, the results achieved and the future researches we intend to pursue. This project was developed by ESCELSA and UFES – Federal University of Espírito Santo. ESCELSA is a Power Distribution Company responsible for energy distribution on 69 of the 77 municipalities of the Brazilian southeastern state Espírito Santo. It has around 3 million customers and 1 million meter readings. It was the first Brazilian energy company to be privatized, by the year 1995. ESCELSA is one of the companies of EDP Group.
Este trabalho descreve uma ferramenta de detecção automática de fraudes no sistema de distribuição de energia elétrica. Mesmo com grande quantidade de ruídos e contradições nos dados, conseguimos obter informações discriminativas. Em especial, utilizamos a série temporal de consumos e critérios de avaliação baseados na matriz de confusão de classificadores.
Algumas técnicas de classificação possuem parâmetros a serem configurados antes do processo de treinamento. Variações nos parâmetros ocasionam variações no desempenho dos classificadores gerados. Em virtude disso, necessita-se buscar valores de parâmetros que maximizem o desempenho do classificador gerado. Esse processo, se realizado manualmente, é bastante trabalhoso. Este trabalho propõe uma automatização deste processo através do uso de algoritmos genéticos. Este processo automático foi utilizado em uma aplicação particular no domínio de detecção de fraudes em uma companhia de distribuição de energia.
ADD (Active Design Documents) é uma abordagem que oferece suporte computacional ao desenvolvimento e à documentação de projetos de engenharia. JADE é um ambiente computacional destinado a facilitar a construção, uso e reuso de sistemas de Documentos Ativos em diversos domínios de aplicação. JADE também generaliza a abordagem ADD para outras tarefas além de design. JADE utiliza uma linguagem de representação baseada em uma rede de dependência paramétrica. Este trabalho apresenta uma proposta de extensão dessa linguagem de representação. Essa extensão permite a geração dinâmica de parâmetros durante a resolução do problema.
© 2013 Universidade Federal do Espírito Santo. Todos os direitos reservados.Av. Fernando Ferrari, 514 - Goiabeiras, Vitória - ES | CEP 29075-910