Big Data Engineer / Developer – focus on Hadoop / Spark

已关闭 已发布的 6 年前 货到付款
已关闭 货到付款

Key Responsibilities As (Senior) Big Data Engineer / Developer you will be closely working with IT architects to elicit requirements, to optimize the system performance as well as to advance its technological foundation.

 Manage very large-scale, multi-tenant and secure, highly-available Hadoop infrastructure supporting rapid data growth for a wide spectrum of innovative internal customers  Provide architectural guidance, planning, estimating cluster capacity, and creating roadmaps for Hadoop cluster deployment  Install Hadoop distributions, updates, patches, version upgrades  Design, implement and maintain enterprise-level security (Kerberos, LDAP/AD, Sentry, etc.)  Develop business relevant applications in Spark, Spark Streaming, Kafka using functional programming methods in Scala  Implement statistical methods and machine learning algorithms to be executed in Spark applications, which are automatically scheduled and running on top of the Big Data platform  Identify new components, functions and features and drive from exploration to implementation  Create run books for troubleshooting, cluster recovery and routine cluster maintenance  Troubleshoot Hadoop-related applications, components and infrastructure issues at large scale  Design, configure and manage the strategy and execution for backup and disaster recovery of big data  3rd-Level-Support (DevOps) for business-critical applications and use cases  Evaluate and propose new tools and technologies to meet the needs of the global organization  Work closely with infrastructure, network, database, application, business intelligence and data science units.

Key Requirements, Skills and Experience  University degree in computer science, mathematics, business informatics or in another technical field of study  Deep expertise in distributed computing and the factors determining and affecting distributed system performance

 Experience with implementing Hadoop clusters in a large scale environment, preferably including multitenancy and security with Kerberos  Excellent hands-on working experience with Hadoop ecosystem for at least 2 years, including Apache Spark, Spark Streaming, Kafka, Zookeeper, Job Tracker, HDFS, MapReduce, Impala, Hive, Oozie, Flume, Sentry, but also with Oracle, MySQL, PSQL  Strong expertise in functional programming, object oriented programming and scripting, i.e. in Scala, Java, Ruby, Groovy, Python, R  Proficiency with IDEs (IntelliJ IDEA, Eclipse, etc.), build automation (Maven, etc.) and continuous integration tools (Jenkins, etc.)  Strong Linux skills; hands-on experience with enterprise-level Linux deployments as well as shell scripting (bash, tcsh, zsh)  Well versed in installing, upgrading & managing distributions of Hadoop (CDH5x), Cloudera Manager, MapR, etc.  Hadoop cluster design, cluster configuration, server requirements, capacity scheduling, installation of services: name node, data node, zookeeper, job tracker, yarn, etc.  Hands-on experience with automation, virtualization, provisioning, configuration and deployment technologies (Chef, Puppet, Ansible, OpenStack, VMware, Docker, etc.)  Experience working in an agile and international environment – excellent time-management skills  Excellent communication skills and high level of motivation (self-starter)  Strong sense of ownership to independently drive a topic to resolution  Ability and willingness to go the extra mile and support the overall team  Business fluent English in speech and writing, German is a plus.

阿帕奇 Big Data Sales Hadoop Linux Spark

项目ID: #16310248

关于项目

远程项目 活跃的6 年前