Lead Site Reliability Engineer

Company	ChaseSee more
Address	London, Greater London
Form of work	Full Time
Category	IT

Job description

JOB DESCRIPTION

Out of the successful launch of Chase in 2021, we’re a new team, with a new mission. We’re creating products that solve real world problems and put customers at the center - all in an environment that nurtures skills and helps you realize your potential. Our team is key to our success. We’re people-first. We value collaboration, curiosity and commitment.

As a Site Reliability Engineer at JPMorgan Chase within the Platform Engineering team, you are the heart of this venture, focused on getting smart ideas into the hands of our customers. You have a curious mindset, thrive in collaborative squads, and are passionate about new technology. By your nature, you are also solution-oriented, commercially savvy and have a head for fintech. You thrive in working in tribes and squads that focus on specific products and projects – and depending on your strengths and interests, you'll have the opportunity to move between them.

While we’re looking for professional skills, culture is just as important to us. We understand that everyone's unique – and that diversity of thought, experience and background is what makes a good team, great. By bringing people with different points of view together, we can represent everyone and truly reflect the communities we serve. This way, there's scope for you to make a huge difference – on us as a company, and on our clients and business partners around the world.

Job responsibilities

Demonstrates and champions Site Reliability culture and practices and exerts technical influence throughout your team
Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers
Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise
Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses
Documents and shares knowledge within your organization via internal forums and communities of practice
Drive incident response efforts, ensuring timely resolution and post-incident analysis to prevent future occurrences
Run the production environment by monitoring availability and taking a holistic view of system health
Measure and optimise system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
Provide primary operational support and engineering for multiple large-scale distributed software applications

Required qualifications, capabilities, and skills

Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other Site Reliability best practices with the ability to implement these practices within an application or platform
Fluency in at least one programming language such as (e.g., Python, Java Spring Boot, .Net, etc.)
Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines
Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
Proven work experience as SRE or similar role running large scale systems in production
Proven public or private cloud experience (GCP - AWS - AZURE preferred)
Infrastructure as code: use Terraform and GitLab CI/CD for automation, containerize our environments (Kubernetes), and leverage cloud technologies to meet our goals
Systems: manage, configure and troubleshoot operating system issues, storage (block and object), networking (VPCs, proxies and CDNs), and administer high-availability PostgreSQL and Redis clusters
Monitoring and instrumentation: implement metrics in Prometheus, Grafana, log management and related system, and Slack/PagerDuty integrations
Extensive Kubernetes operational experience (ideally including Istio, ArgoCD)

Our Technology Stack:

GCP (AWS and AZURE to come)
Kubernetes, ArgoCD, Helm, Ambassador, Istio
JVM-based languages, GoLang
Infrastructure As Code (Pulumi, Terraform, Crossplane)
Grafana Cloud
Pulsar, Cockroach DB, HashiCorp Vault

We prefer to be co-located but we understand that people need flexibility.

ABOUT US

J.P. Morgan is a global leader in financial services, providing strategic advice and products to the world’s most prominent corporations, governments, wealthy individuals and institutional investors. Our first-class business in a first-class way approach to serving clients drives everything we do. We strive to build trusted, long-term partnerships to help our clients achieve their business objectives.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. In accordance with applicable law, we make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as any mental health or physical disability needs.

ABOUT THE TEAM

Our Corporate & Investment Bank relies on innovators like you to build and maintain the technology that helps us safely service the world’s important corporations, governments and institutions. You'll develop solutions that help the bank provide strategic advice, raise capital, manage risk, and extend liquidity in markets spanning over 100 countries around the world.

Refer code: 2688696. Chase - The previous day - 2024-02-03 03:33