Dimensionality Reduction Approach using Attributes Extraction and Attributes Selection in Gene Expression Databases
Keywords:Data Dimensionality Reduction, Attribute Selection, Attribute Extraction, Microarray
The gene expression databases are formed by a high number of attributes. To deal with this amount, data dimensionality reduction is used in order to minimize the volume of data to be treated regarding the number of attributes, and to increase the generalization capability of learning methods by eliminating irrelevant and/or redundant data. This paper proposes an approach to means of dimensionality reduction, which joins attribute extraction and attributes selection. For this, we used the Random Projection method and the filter and wrapper approaches for the attribute selection. The experiments are realized in five gene expression microarray databases. The results of the experiments showed that join of those approaches can provide promising results.
. Ghodsi, A.: “Dimensionality reduction a short tutorial.” Department of Statistics and Actuarial Science, University of Waterloo, Ontario, Canada, 2006.
. Maaten, L. V. D.; Postma, E., Herik, J. V. D.: “Dimensionality reduction: a comparative.” Journal of Machine Learning Research, vol. 10, pp. 66-71, 2009.
. Almugren, N., Alshamlan, H.: “A Survey on Hybrid Feature Selection Methods in Microarray Gene Expression Data for Cancer Classification.” IEEE Access, vol. 7, pp. 78533-78548, Jun. 2019.
. Witten, I. H., Eibe, F., Hall, M. A.: Data mining: practical machine learning tools and techniques. Burlington, MA: Morgan Kaufmann, 2016.
. Manikandan, G., Abirami, S.: “A Survey on Feature Selection and Extraction Techniques for High-Dimensional Microarray Datasets” in Knowledge Computing and its Applications, Margret Anouncia S. and Wiil U. Ed. Singapore: Springer, 2018. pp. 311-333.
. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. New York: Kluwer academic Publishers, 1998, pp. 214.
. Liu, H., Yu, L.: “Toward Integrating Feature Selection Algorithms for Classification and Clustering.” IEEE Transactions on Knowledge and Data Engineering, vol. 17(4), pp. 491–502, 2005.
. Dash, M., Liu, H.: “Feature selection for classification.” Intelligent Data Analysis, vol. 1, pp. 131–156, 1997.
. Hall, M.: “Correlation-based feature selection for discrete and numeric class machine learning.” In Proc. 17th Proceedings of the International Conference on Machine Learning, 2000, pp. 359-366.
. Kohavi, R.; John, G. H.: “The Wrapper Approach,” in Feature Extraction, Construction and Selection: a data mining perspective, H. Liu and H. Motoda, Ed. New York: Springer US, 1998, pp. 33-49.
. Cunningham, J. P., Ghahramani, Z.: “Linear dimensionality reduction: survey, insights, and generalizations.” Journal of Machine Learning Research, vol. 16, pp. 2859-2900, 2015.
. Jolliffe, I. T., Cadima, J.: “Principal component analysis: a review and recent developments.” Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences, vol. 374(2065), 2016.
. Wang, L., Fu, X.: Data Mining with Computational Intelligence. Berlin: Springer, 2009.
. Bingham, E., Mannila, H.: “Random projection in dimensionality reduction: applications to image and text data”. In Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 245–250.
. Bhui, N., Ram, P. K, Kuila, P.: “Feature Selection from Microarray Data based on Deep Learning Approach.” In Proc. 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2020, pp. 1-5.
. Borges, H. B., Nievola, J. C.: “Comparing the dimensionality reduction methods in gene expression databases.” Expert Systems with Applications, vol. 39(12), pp. 10780-107958, 2012.
. Kar, S., Das Sharma, K, Maitra, M.: “Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique.” Expert Systems with Applications, vol.42(1), pp. 612-627, 2015.
. Remeseiro, B., Bolon-Canedo, V.: “A review of feature selection methods in medical applications.” Computers in Biology and Medicine, vol. 112, pp. 103375-103384, 2019.
. Bertoni, A., Valentini, G.: “Random projections for assessing gene expression cluster stability.” In Proc. 5th IEEE International Joint Conference on Neural Networks (IJCNN), 2005, pp. 149-154.
. Khoirunnisa, A., Adiwijaya, Rohmawati, A. A.: “Implementing Principal Component Analysis and Multinomial Logit for Cancer Detection based on Microarray Data Classification.” In Proc. 7th International Conference on Information and Communication Technology (ICoICT), 2019, pp. 1-6.
. Cui, Y., Zheng, C., Yang, J.,Sha, W.: “Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data.” Computers in biology and medicine, vol. 43(7), pp. 933-941, 2013.
. You, W., Yang, Z., Yuan, M., Ji, G.: “Totalpls: local dimension reduction for multicategory microarray data.” IEEE Transactions on Human-Machine Systems, vol. 44, pp. 125-138, 2013.
. Sasikala, S., Appavu alias Balamurugan, S, Geetha, S. “Multi filtration feature selection (MFFS) to improve discriminatory ability in clinical data set.” Applied Computing and Informatics, vol. 12(2), pp. 117-127, 2016.
. Badaoui, F., Amar, A., Hassou, L. A., Zoglat, A., Okou, C.G.: “Dimensionality reduction and class prediction algorithm with application to microarray Big Data.” Journal of Big Data, vol. 4, no. 32, Oct. 2017.
. Elaziz, M. E. A.: “Simultaneous feature extraction and selection of microarray data using fuzzy-rough based multiobjective nonnegative matrix factorization.” Journal of Intelligent & Fuzzy Systems, vol. 33(6), pp. 4043-4053, 2017.
. Achlioptas, D.: “Database-friendly random projections.” In Proc. 20th ACM SIGMODSIGACT-SIGART symposium on Principles of database systems, 2001, pp. 274–281.
How to Cite
Authors who submit papers with this journal agree to the following terms.