ホーム  >   プロジェクト  >   Tech Conference

September, 2019

What's New and Coming to Apache Hivemall: Building More Flexible Machine Learning Solution for Apache Hive and Spark

ApacheCon North America 2019

What's New and Coming to Apache Hivemall: Building More Flexible Machine Learning Solution for Apache Hive and Spark @ ApacheCon North America 2019

Abstract

Apache Hivemall is a scalable machine learning library for Apache Hive, Spark, and Pig. Hivemall allows us to apply a wealth of machine learning techniques to massive data stored in distributed storage by just writing a series of SQL-like queries. It provides classification, regression, recommendation, anomaly detection, and topic modeling functionalities in a scalable manner, along with a variety of auxiliary functions for data preprocessing and feature engineering.

This talk demonstrates the Hivemall library with a special emphasis on its new features merged after the first Apache Incubator release. Hivemall v0.5.2-incubating, the latest version as of April 2019, has introduced a state-of-the-art generalized factor model named Field-Aware Factorization Machines and many useful UDFs (e.g., data sketching) originated from the Brickhouse Hive UDF package.

We also show the roadmap of this incubating project. Open issues and pull requests include Apache Spark 2.4 support, implementation of new algorithms such as word2vec and multi-nominal logistic regression, as well as integration with widely-used tools like XGBoost and LightGBM.

Slides

  書いた人: たくち

たくちです。長野県出身、カナダ・バンクーバー在住のソフトウェアエンジニア。これまでB2B/B2Cの各領域で、データサイエンス・機械学習のプロダクト化および顧客への導入支援・コンサルティング、そして関連分野のエバンジェリズムに携わってきました。趣味は旅行、マラソン、登山、ブリュワリー巡り。近況は takuti.me/now より。ブログへのご意見・ご感想など、@takuti または [email protected] までいつでもお気軽にご連絡ください。

  オンラインで直接話す

※当サイトおよび関連するメディア上での発言はすべて私個人の見解であり、所属する(あるいは過去に所属した)組織のいかなる見解を代表するものでもありません。