Scope
Core contributor to Apache Hivemall, a scalable machine learning library that enables SQL-based machine learning at scale on Apache Hive, Spark, and Pig. Contributed to new algorithms, PySpark integration, and community outreach.
Technology
- Platform: Apache Hive, Apache Spark (PySpark), Apache Pig
- Algorithms: Classification, Regression, Recommendation, Anomaly Detection, NLP, Topic Modeling
- Features: Field-Aware Factorization Machines, Data Sketching, Feature Engineering UDFs
- Integration: XGBoost, LightGBM, Digdag workflow engine
Key Contributions
- PySpark Integration: Enabled seamless access to Hivemall capabilities through SparkSession with Hive support, combining Spark/Hive scalability with Python ecosystem flexibility
- Algorithm Implementation: Contributed state-of-the-art generalized factor models and recommendation techniques
- Query-Based ML in Production: Simplified end-to-end ML workflows through SQL-like interface, reducing complexity from numerous code fragments to dozens of lines of queries
Presentations & Publications
- ApacheCon Europe 2019: Apache Hivemall Meets PySpark [Slides] [Video]
- ApacheCon North America 2019: What's New and Coming to Apache Hivemall (v0.5.2-incubating features and roadmap) [Slides]
- RecSys 2018: Query-Based Simple and Scalable Recommender Systems with Apache Hivemall [Paper] [Poster] [Video]
- ODSC Europe 2018: Apache Hivemall: Query-Based Handy, Scalable Machine Learning on Hive [Slides] [Video]
Learn More
- Project: Apache Hivemall
- GitHub: apache/incubator-hivemall
- Production: No-Coding ML Platform for Marketing
Author: Takuya Kitazawa
I am a product builder, mentor, and advocate for sustainable technology development with a decade of experience in AI/ML products, data systems, and digital transformation. Based in Canada and originally from Japan, I have lived and worked globally, including part-time residence in Malawi, Africa. Visit my portfolio to learn more about my work, or reach out to me at [email protected].