CACHE HIERARCHY INSPIRED COMPRESSION: A NOVEL ARCHITECTURE FOR DATA STREAMS

Authors

  • G. Holmes Computer Science Department, University of Waikato, Hamilton, New Zealand
  • B. Pfahringer Computer Science Department, University of Waikato, Hamilton, New Zealand
  • R. Kirkby Computer Science Department, University of Waikato, Hamilton, New Zealand

DOI:

https://doi.org/10.33736/jita.54.2007

Keywords:

Data streams, classification, cache hierarchy

Abstract

We present an architecture for data streams based on structures typically found in web cache hierarchies. The main idea is to build a meta level analyser from a number of levels constructed over time from a data stream. We present the general architecture for such a system and an application to classification. This architecture is an instance of the general wrapper idea allowing us to reuse standard batch learning algorithms in an inherently incremental learning environment. By artificially generating data sources we demonstrate that a hierarchy containing a mixture of models is able to adapt over time to the source of the data. In these experiments the hierarchies use an elementary performance based replacement policy and unweighted voting for making classification decisions.

References

Breiman L. (1999), "Bagging Predictors", Machine Learning Journal, 24 (2), 123-140.

https://doi.org/10.1007/BF00058655

Domingos P., and Hulten G. (2000), "Mining high-speed data streams", Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 71-80.

https://doi.org/10.1145/347090.347107

Fan W., Stolfo S.J. and Zhang J. (1999) "The Application of AdaBoost for Distributed, Scalable and On-Line Learning", Proc. 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 362-366.

https://doi.org/10.1145/312129.312283

Frank E., Holmes G., Kirkby R. and Hall M. (2002), "Racing Committees for Large Datasets", Proc. International Conference on Discovery Science, 153-164.

https://doi.org/10.1007/3-540-36182-0_15

Hettich, S. and Bay S.D. (1999), "The UCI KDD Archive", http://kdd.ics.uci.edu, University of California, Irvine, Dept. of Information and Computer Science.

Kohavi, R., and John G.H. (1998), "The wrapper approach" in "Feature Extraction, Construction and Selection: A Data Mining Perspective", edited by H. Liu and H. Motoda, Kluwer Academic, 33-50.

https://doi.org/10.1007/978-1-4615-5725-8_3

Oza N., and Russell S. (2001), "Experimental Comparisons of Online and Batch Versions of Bagging and Boosting", Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery in Databases and Data Mining, 359-364.

https://doi.org/10.1145/502512.502565

Rodriguez, P., Spanner, C., and Biersack, E.W. (2001), "Analysis of Web Caching Architectures: Hierarchical and Distributed Caching", IEEE/ACM Transactions on Networking, 9 (4), 404-418.

https://doi.org/10.1109/90.944339

Street W., and Kim Y. (2001), "A streaming ensemble algorithm (SEA) for large-scale classification", Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery in Databases and Data Mining, 377-382.

https://doi.org/10.1145/502512.502568

Wang H., Fan W., Yu P. and Han J. (2003), "Mining Concept-Drifting Data Streams Using Ensemble Classifiers", Proc. Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2003), 226-235.

https://doi.org/10.1145/956750.956778

Wang, J. (1999), "A Survey of Web Caching Schemes for the Internet", ACM Computer Communication Review, 29 (5), 36-46.

https://doi.org/10.1145/505696.505701

Wessels, D. and Claffy, K. (1998), "ICP and the Squid web cache", IEEE Journal on Selected Areas in Communications, 16 (3), 345-357.

https://doi.org/10.1109/49.669043

Wessels D. (2004), Squid: The Definitive Guide, O'Reilly.

Witten, I.H., Frank, E. (2005), Data mining: practical machine learning tools and techniques. (second ed). Morgan Kaufmann, San Francisco, CA.

Downloads

Published

2016-04-26

How to Cite

Holmes, G., Pfahringer, B., & Kirkby, R. (2016). CACHE HIERARCHY INSPIRED COMPRESSION: A NOVEL ARCHITECTURE FOR DATA STREAMS. Journal of IT in Asia, 2(1), 39–52. https://doi.org/10.33736/jita.54.2007

Issue

Section

Articles