High Dimensional Data Clustering Based On Feature Selection Algorithm
K.SWATHI, B.RANJITH, , ,
Affiliations M.Tech Research Scholar, Priyadarshini Institute of Technology and Science for WomenHOD-CSE, Priyadarshini Institute of Technology and Science for Women
Feature selection is the process of identifying a subset of the most useful features that produces
compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the
efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of
features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a FAST
clustering-based feature Selection algorithm (FAST) is proposed and experimentally evaluated. The FAST algorithm
works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In
the second step, the most representative feature that is strongly related to target classes is selected from each
cluster to form a subset of features. Features in different clusters are relatively independent; the clustering-based
strategy of FAST has a high probability of producing a subset of useful and independent features. The MinimumSpanning Tree (MST) using Primâ€™s algorithm can concentrate on one tree at a time. To ensure the efficiency of FAST,
adopt the efficient MST using the Kruskalâ€™s Algorithm clustering method.
K.SWATHI,B.RANJITH."High Dimensional Data Clustering Based On Feature Selection Algorithm". International Journal of Computer Engineering In Research Trends (IJCERT) ,ISSN:2349-7084 ,Vol.1, Issue 06,pp.379-383, DECEMBER - 2014, URL :https://ijcert.org/ems/ijcert_papers/V1I65.pdf,
 Almuallim H. and Dietterich T.G., Algorithms
for Identifying Relevant Features, In Proceedings
of the 9th Canadian Conference on AI, pp 38-45,
 Almuallim H. and Dietterich T.G., Learning
boolean concepts in the presence of many
irrelevant features, Artificial Intelligence, 69(1-2),
pp 279- 305, 1994.
 Arauzo-Azofra A., Benitez J.M. and Castro J.L.,
A feature set measure based on relief, In
Proceedings of the fifth international conference on
Recent Advances in Soft Computing, pp 104-109,
 Baker L.D. and McCallum A.K., Distributional
clustering of words for text classification, In
Proceedings of the 21st Annual international ACM
SIGIR Conference on Research and Development
in information Retrieval, pp 96- 103, 1998.
 Battiti R., Using mutual information for
selecting features in supervised neural net
learning, IEEE Transactions on Neural Networks,
5(4), pp 537- 550, 1994.
 Bell D.A. and Wang, H., formalism for
relevance and its application in feature subset
selection, Machine Learning, 41(2), pp 175-195,
 Biesiada J. and Duch W., Features election for
high-dimensionaldataÅ‚a Pearson redundancy
based filter, AdvancesinSoftComputing, 45, pp
 Butterworth R., Piatetsky-Shapiro G. and
Simovici D.A., On Feature Selection through
Clustering, In Proceedings of the Fifth IEEE
international Conference on Data Mining, pp 581-
 Cardie, C., Using decision trees to improve
case-based learning, In Proceedings of Tenth
International Conference on Machine Learning, pp
 Chanda P., Cho Y., Zhang A. and Ramanathan
M., Mining of Attribute Interactions Using
Information Theoretic Metrics, In Proceedings of
IEEE international Conference on Data Mining
Workshops, pp 350-355, 2009.
 Chikhi S. and Benhammada S., ReliefMSS: a
variation on a feature ranking ReliefF algorithm.
Int. J. Bus. Intell. Data Min. 4(3/4), pp 375-390, 2009.
 Cohen W., Fast Effective Rule Induction, In
Proc. 12th international Conf. Machine Learning
(ICMLâ€™95), pp 115-123, 1995.
 Dash M. and Liu H., Feature Selection for
Classification, Intelligent Data Analysis, 1(3), pp
 Dash M., Liu H. and Motoda H., Consistency
based feature Selection, In Proceedings of the
Fourth Pacific Asia Conference on Knowledge
Discovery and Data Mining, pp 98-109, 2000.
 Das S., Filters, wrappers and a boosting-based
hybrid for feature Selection, In Proceedings of the
Eighteenth International Conference on Machine
Learning, pp 74-81, 2001.
 Dash M. and Liu H., Consistency-based search
in feature selection. Artificial Intelligence, 151(1-2),
pp 155-176, 2003.
 Demsar J., Statistical comparison of classifiers
over multiple data sets, J. Mach. Learn. Res., 7, pp
 Dhillon I.S., Mallela S. and Kumar R., A
divisive information theoretic feature clustering
algorithm for text classification, J. Mach. Learn.
Res., 3, pp 1265-1287, 2003.
 Dougherty, E. R., Small sample issues for
microarray-based classification. Comparative and
Functional Genomics, 2(1), pp 28-34, 2001.
 Fayyad U. and Irani K., Multi-interval
discretization of continuous-valued attributes for
classification learning, In Proceedings of the
Thirteenth International Joint Conference on
Artificial Intelligence, pp 1022-1027, 1993.
Authors are not required to pay any article-processing charges (APC) for their article to be published open access in Journal IJCERT. No charge is involved in any stage of the publication process, from administrating peer review to copy editing and hosting the final article on dedicated servers. This is free for all authors.
News & Events
Latest issue :Volume 10 Issue 1 Articles In press
☞ INVITING SUBMISSIONS FOR THE NEXT ISSUE :
☞ LAST DATE OF SUBMISSION : 31st March 2023
☞ SUBMISSION TO FIRST DECISION : In 7 Days
☞ FINAL DECISION : IN 3 WEEKS FROM THE DAY OF SUBMISSION
All the authors, conference coordinators, conveners, and guest editors kindly check their articles' originality before submitting them to IJCERT. If any material is found to be duplicate submission or sent to other journals when the content is in the process with IJCERT, fabricated data, cut and paste (plagiarized), at any stage of processing of material, IJCERT is bound to take the following actions.
1. Rejection of the article.
2. The author will be blocked for future communication with IJCERT if duplicate articles are submitted.
3. A letter regarding this will be posted to the Principal/Director of the Institution where the study was conducted.
4. A List of blacklisted authors will be shared among the Chief Editors of other prestigious Journals
We have been screening articles for plagiarism with a world-renowned tool: Turnitin However, it is only rejected if found plagiarized. This more stern action is being taken because of the illegal behavior of a handful of authors who have been involved in ethical misconduct. The Screening and making a decision on such articles costs colossal time and resources for the journal. It directly delays the process of genuine materials.