Scope and Responsibilities:
As a Monitoring Engineer, you’ll work with a variety of talented Client teams, serving as both an expert for internal users, as well as a driving force for performance, best practices, and data integrity within the environment. You will be required to understand and follow through with the entire data pipeline, from providing advice on proper data formats to app teams, to troubleshooting and tuning forwarders and search cluster, ultimately ensuring timely and accurate data is searchable 24x7.
· Installing, configuring and maintaining log management solutions and tools
· Design and build resilient and scalable data pipelines to transform and transport to various reporting platforms
· Build proactive monitoring solutions to detect failure across various components within pipeline
· Conduct investigations for root cause analysis of problems or issues across data pipelines and platforms
· Identify, analyze, and interpret trends or patterns in complex data sources
· Adapt to rapidly growing ecosystem and integrate demanding technologies with reporting solutions to provide complementary solution
· Plan, engineer, and implement robust and cost-effective computing environments, exploiting emerging technologies to provide compelling solutions.
· Identify process improvements and build creative automated solutions.
· Inspect, cleanse, enhance, and monitor the integrity of a variety of data sources.
· Collaborate with the engineering team, application developers, management, and infrastructure teams to assess near- and long-term needs.
· Improve the standards of the consumer’s experience by providing metric visibility into applications performance.
· Serve in the role of technical subject matter expert, assisting in upholding search best practices, through user activity audits, and functional alerting.
· Provide expertise to support groups that require performance and troubleshooting guidance, including query tuning and upholding best practices.
· Participate in process development with customers and service providers.
· Effectively communicate tool capabilities and processes to varying stakeholders.
· Participate in on-call rotations
· Bachelor’s degree in Computer Science, Information Technology, Mathematics, Business Administration, or a related field; or, an equivalent combination of education and related work experience.
· Minimum of 5 years of experience with implementation, operations, maintenance of IT systems and/or administration of software functions in multi-platform and multi-system environments.
· Strong Linux (Red Hat, CentOS) background with troubleshooting and administration skills
· Experience with configuring/maintaining log interfaces/tools such as ryslog and syslog-ng
· Experience implementing/maintaining log management tools like Splunk, ELK, Devo, Sumologic, Microsoft Operations Management Suite, nxlog
· Deep Understanding of nxlog configurations, modules and integrations with various applications
· Experience implementing nxlog manager and managing nxlog collectors at scale
· Strong knowledge of AWS, Google Cloud, and Azure architecture and services (cloud technology)
· Understanding and experience with configuration management tools and concepts such as puppet, chef, Ansible, SCCM
· Experience with Orchestration tools like Terraform, Cloudformation
· Advanced scripting skills to include but not limited to: shell scripts and supporting program languages – python, bash, regular expression, perl
· Experience with logging practices and log transport; data on-boarding; and field extractions
· Experience with monitoring solutions and methodologies including server and network performance, hardware, web synthetics, and application performance monitoring a plus. Including but not limited to New Relic, ScienceLogic, Splunk, ExtraHop, AppDynamics, DynaTrace, KeyNote, Microsoft SCOM, Solarwinds Orion, etc.
· Strong communication skills and ability to interact with management and all teams involved in the operational or development process