Hivemall

Home > My work > Hivemall

2016 - 2022

Hivemall

Scope

Core contributor to Hivemall, a scalable machine learning library that enables SQL-based machine learning at scale on Apache Hive, Spark, and Pig. Contributed to new algorithms, PySpark integration, and community outreach.

Technology

Platform: Apache Hive, Apache Spark (PySpark), Apache Pig
Algorithms: Classification, Regression, Recommendation, Anomaly Detection, NLP, Topic Modeling
Features: Field-Aware Factorization Machines, Data Sketching, Feature Engineering UDFs
Integration: XGBoost, LightGBM, Digdag workflow engine

Key Contributions

PySpark Integration: Enabled seamless access to Hivemall capabilities through SparkSession with Hive support, combining Spark/Hive scalability with Python ecosystem flexibility
Algorithm Implementation: Contributed state-of-the-art generalized factor models and recommendation techniques
Query-Based ML in Production: Simplified end-to-end ML workflows through SQL-like interface, reducing complexity from numerous code fragments to dozens of lines of queries

Presentations & Publications

ApacheCon Europe 2019: Apache Hivemall Meets PySpark [Slides] [Video]
ApacheCon North America 2019: What's New and Coming to Apache Hivemall (v0.5.2-incubating features and roadmap) [Slides]
RecSys 2018: Query-Based Simple and Scalable Recommender Systems with Apache Hivemall [Paper] [Poster] [Video]
ODSC Europe 2018: Apache Hivemall: Query-Based Handy, Scalable Machine Learning on Hive [Slides] [Video]

Learn More

Project: Apache Hivemall (incubating)
GitHub: apache/incubator-hivemall
Production: No-Coding ML Platform for Marketing

Author: Takuya Kitazawa

I am a product builder, mentor, and advocate for sustainable technology development with a decade of experience in AI/ML products, data systems, and digital transformation. Based in Canada and originally from Japan, I have lived and worked globally, including part-time residence in Malawi, Africa. Visit my portfolio to learn more about my work, or reach out to me at [email protected].