CACHE HIERARCHY INSPIRED COMPRESSION: A NOVEL ARCHITECTURE FOR DATA STREAMS
DOI:
https://doi.org/10.33736/jita.54.2007Keywords:
Data streams, classification, cache hierarchyAbstract
We present an architecture for data streams based on structures typically found in web cache hierarchies. The main idea is to build a meta level analyser from a number of levels constructed over time from a data stream. We present the general architecture for such a system and an application to classification. This architecture is an instance of the general wrapper idea allowing us to reuse standard batch learning algorithms in an inherently incremental learning environment. By artificially generating data sources we demonstrate that a hierarchy containing a mixture of models is able to adapt over time to the source of the data. In these experiments the hierarchies use an elementary performance based replacement policy and unweighted voting for making classification decisions.References
Breiman L. (1999), "Bagging Predictors", Machine Learning Journal, 24 (2), 123-140.
https://doi.org/10.1007/BF00058655
Domingos P., and Hulten G. (2000), "Mining high-speed data streams", Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 71-80.
https://doi.org/10.1145/347090.347107
Fan W., Stolfo S.J. and Zhang J. (1999) "The Application of AdaBoost for Distributed, Scalable and On-Line Learning", Proc. 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 362-366.
https://doi.org/10.1145/312129.312283
Frank E., Holmes G., Kirkby R. and Hall M. (2002), "Racing Committees for Large Datasets", Proc. International Conference on Discovery Science, 153-164.
https://doi.org/10.1007/3-540-36182-0_15
Hettich, S. and Bay S.D. (1999), "The UCI KDD Archive", http://kdd.ics.uci.edu, University of California, Irvine, Dept. of Information and Computer Science.
Kohavi, R., and John G.H. (1998), "The wrapper approach" in "Feature Extraction, Construction and Selection: A Data Mining Perspective", edited by H. Liu and H. Motoda, Kluwer Academic, 33-50.
https://doi.org/10.1007/978-1-4615-5725-8_3
Oza N., and Russell S. (2001), "Experimental Comparisons of Online and Batch Versions of Bagging and Boosting", Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery in Databases and Data Mining, 359-364.
https://doi.org/10.1145/502512.502565
Rodriguez, P., Spanner, C., and Biersack, E.W. (2001), "Analysis of Web Caching Architectures: Hierarchical and Distributed Caching", IEEE/ACM Transactions on Networking, 9 (4), 404-418.
https://doi.org/10.1109/90.944339
Street W., and Kim Y. (2001), "A streaming ensemble algorithm (SEA) for large-scale classification", Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery in Databases and Data Mining, 377-382.
https://doi.org/10.1145/502512.502568
Wang H., Fan W., Yu P. and Han J. (2003), "Mining Concept-Drifting Data Streams Using Ensemble Classifiers", Proc. Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2003), 226-235.
https://doi.org/10.1145/956750.956778
Wang, J. (1999), "A Survey of Web Caching Schemes for the Internet", ACM Computer Communication Review, 29 (5), 36-46.
https://doi.org/10.1145/505696.505701
Wessels, D. and Claffy, K. (1998), "ICP and the Squid web cache", IEEE Journal on Selected Areas in Communications, 16 (3), 345-357.
https://doi.org/10.1109/49.669043
Wessels D. (2004), Squid: The Definitive Guide, O'Reilly.
Witten, I.H., Frank, E. (2005), Data mining: practical machine learning tools and techniques. (second ed). Morgan Kaufmann, San Francisco, CA.
Downloads
Published
How to Cite
Issue
Section
License
Copyright Transfer Statement for Journal
1) In signing this statement, the author(s) grant UNIMAS Publisher an exclusive license to publish their original research papers. The author(s) also grant UNIMAS Publisher permission to reproduce, recreate, translate, extract or summarize, and to distribute and display in any forms, formats, and media. The author(s) can reuse their papers in their future printed work without first requiring permission from UNIMAS Publisher, provided that the author(s) acknowledge and reference publication in the Journal.
2) For open access articles, the author(s) agree that their articles published under UNIMAS Publisher are distributed under the terms of the CC-BY-NC-SA (Creative Commons Attribution-Non Commercial-Share Alike 4.0 International License) which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original work of the author(s) is properly cited.
3) For subscription articles, the author(s) agree that UNIMAS Publisher holds copyright, or an exclusive license to publish. Readers or users may view, download, print, and copy the content, for academic purposes, subject to the following conditions of use: (a) any reuse of materials is subject to permission from UNIMAS Publisher; (b) archived materials may only be used for academic research; (c) archived materials may not be used for commercial purposes, which include but not limited to monetary compensation by means of sale, resale, license, transfer of copyright, loan, etc.; and (d) archived materials may not be re-published in any part, either in print or online.
4) The author(s) is/are responsible to ensure his or her or their submitted work is original and does not infringe any existing copyright, trademark, patent, statutory right, or propriety right of others. Corresponding author(s) has (have) obtained permission from all co-authors prior to submission to the journal. Upon submission of the manuscript, the author(s) agree that no similar work has been or will be submitted or published elsewhere in any language. If submitted manuscript includes materials from others, the authors have obtained the permission from the copyright owners.
5) In signing this statement, the author(s) declare(s) that the researches in which they have conducted are in compliance with the current laws of the respective country and UNIMAS Journal Publication Ethics Policy. Any experimentation or research involving human or the use of animal samples must obtain approval from Human or Animal Ethics Committee in their respective institutions. The author(s) agree and understand that UNIMAS Publisher is not responsible for any compensational claims or failure caused by the author(s) in fulfilling the above-mentioned requirements. The author(s) must accept the responsibility for releasing their materials upon request by Chief Editor or UNIMAS Publisher.
6) The author(s) should have participated sufficiently in the work and ensured the appropriateness of the content of the article. The author(s) should also agree that he or she has no commercial attachments (e.g. patent or license arrangement, equity interest, consultancies, etc.) that might pose any conflict of interest with the submitted manuscript. The author(s) also agree to make any relevant materials and data available upon request by the editor or UNIMAS Publisher.