I have extensive Spark experience especially dealing with large data set. I tuned a lot of Spark program and some of them I reduced processing time from a few days to several hours. Below is my summary.
Highly effective Technical Leader with over 25 years of experience, Andrew Kim is specialising in data integration, data conversion, data engineering, ETL, big data architect, data analytics, data visualization, data science, analytics platforms, and cloud architecture. He has an array of skills in building data platforms, analytic consulting, trend monitoring, data modelling, data governance and machine learning. Andrew Kim is recognised as a thought technical leader with expert ability to successfully implement end-to-end architecture, design and delivery for big data, data warehouse and business intelligence projects. For the last 10 years, he successfully delivered a number of large data analytics platforms and applications for a number of large Australian governments and corporations. Andrew Kim has a bachelor degree in Information Systems, a master degree in Computer Science and a number of industry qualifications.
SPECIALTIES
TECHNICAL SKILLS
• Big Data (Hortonworks and Cloudera) – Spark(PySpark, Scala), Kafka, Hive,Impala, NiFi, HDFS, Sqoop, Ranger, Yarn, Solr, SAM, Schema Registry, SuperSet
• Language: Python, Scala, R, JavaScript
• Data Visualization – Tableau, PowerBI, OBIEE, DOMO
• Plunk & ELK Stack – ElasticSearch, Logstash, Filebeat, Kibana
• AWS – S3, RefShift,