· Hadoop administrator provides support and maintenance and its eco-systems including HDFS, Yarn, Hive, LLAP, Druid, Impala, Spark, Kafka, HBase, Cloudera Work Bench, etc.
· Accountable for storage, performance tuning and volume management of Hadoop clusters and MapReduce routines
· Deploys Hadoop cluster, add and remove nodes, keep track of jobs, monitor critical parts of the cluster, configure name-node high availability, schedule and configure it and take backups.
· Installs and configures software, installs patches, and upgrades software as needed.
· Capacity planning and implementation of new/upgraded hardware and software releases for storage infrastructure.
· Involves designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration
· Communicates with other development, administrating and business teams. They include infrastructure, application, network, database, and business intelligence teams.
· Responsible for Data Lake and Data Warehousing design and development.
· Collaboration with various technical/non-technical resources such as infrastructure and application teams regarding project work, POCs (Proofs of Concept) and/or troubleshooting exercises.
· Configuring Hadoop security, specifically Kerberos integration with ability to implement.
· Creation and maintenance of job and task scheduling and administration of jobs.
· Responsible for data movement in and out of Hadoop clusters and data ingestion using Sqoop and/or Flume
· Review Hadoop environments and determine compliance with industry best practices and regulatory requirements.
· Data modeling, designing and implementation of data based on recognized standards.
· Working as a key person for Vendor escalation
· On-call rotation is required to support 24/7 environment and is also expected to be able to work outside business hours to support corporate needs.
Minimum Qualifications:
· Bachelor's degree in Information Systems, Engineering, Computer Science, or related field from an accredited university.
· Intermediate experience in a Hadoop production environment.
· Must have intermediate experience and expert knowledge with at least 4 of the following:
o Hands on experience with Hadoop administration in Linux and virtual environments.
o Well versed in installing & managing distributions of Hadoop (Cloudera).
o Expert knowledge and hands-on experience in Hadoop ecosystem components; including HDFS, Yarn, Hive, LLAP, Druid, Impala, Spark, Kafka, HBase, Cloudera Work Bench, etc.
o Thorough knowledge of Hadoop overall architecture.
o Experience using and troubleshooting Open Source technologies including configuration management and deployment.
o Data Lake and Data Warehousing design and development.
o Experience reviewing existing DB and Hadoop infrastructure and determine areas of improvement.
o Implementing software lifecycle methodology to ensure supported release and roadmap adherence.
o Configuring high availability of name-nodes.
o Scheduling and taking backups for Hadoop ecosystem.
o Data movement in and out of Hadoop clusters.
o Good hands-on scripting experience in a Linux environment.
o Experience in project management concepts, tools (MS Project) and techniques.
o A record of working effectively with application and infrastructure teams.
· Strong ability to organize information, manage tasks and use available tools to effectively contribute to a team and the organization
Bachelor's degree