Edible Arrangements
Site Reliability Engineer
What's your preference?
Job Description
- Req#: 578482
- Monitoring and Site Reliability
- Incident Response and Site Health
- System Maintenance and Improvement
- Collaboration and Documentation
- Continuous Improvement and Learning
- Develop, implement, and manage monitoring and alerting systems using Datadog, Azure Application Insights, and other related technologies to gain real-time awareness of system health and potential issues.
- Ensure integration of Datadog with .NET, Node.js and React-based applications for comprehensive monitoring of application performance and health.
- Establish proactive monitoring practices to reduce site outages, gain insight into system performance, and identify blockers within Azure DevOps pipelines.
- Design and implement Standard Operating Procedures (SOPs) to effectively respond to and resolve incidents, minimizing downtime and ensuring prompt recovery.
- Collaborate with engineering and product teams to establish and execute comprehensive incident response plans, focusing on improving the availability, performance, and reliability of e-commerce platforms.
- Optimize Azure DevOps pipelines to ensure blockers, errors, and any build issues are proactively addressed, enhancing site deployment efficiency and reliability.
- Maintain and improve application performance and resilience through enhancements in Azure Application Services, Azure Front Door, and Azure Application Gateway.
- Execute SQL queries to assess and troubleshoot database performance and availability issues related to the operational health of the site.
- Work closely with developers to ensure that monitoring tools are embedded effectively into the development cycle and are aligned with the business needs.
- Create detailed documentation, including SOPs, best practices, incident management guides, and monitoring configurations.
- Stay current with emerging monitoring technologies and identify opportunities to apply them to enhance the platform's reliability and scalability.
- Promote a culture of learning and proactive improvement through root cause analysis and post-incident reviews to prevent repeat occurrences.
- 5+ years of experience in Site Reliability Engineering, preferably within an e-commerce or high-traffic web application environment.
- Strong expertise with Datadog, including setting up integrations, creating custom metrics, dashboards, and alerts, specifically in .NET, Node.js, and React applications.
- Proven experience with Azure Application Insights, Azure DevOps, and the ability to implement monitoring and alerting solutions in cloud environments.
- Hands-on experience managing and optimizing Azure App Services, Azure Front Door, Azure Application Gateway, and SQL databases from a resilience and performance standpoint.
- Familiarity with SOP development for incident management, proactive monitoring, and site reliability.
- Knowledge of CI/CD pipelines in Azure DevOps, and experience in identifying and resolving build blockers and pipeline issues.
- Strong skills in writing SQL queries to diagnose and resolve issues.
- Excellent interpersonal skills, with an emphasis on collaboration, clear communication, and the ability to explain technical concepts to non-technical stakeholders.
- Ability to work in a fast-paced environment, with strong analytical and problem-solving skills, and a proactive mindset towards automation and improvement.
- Advanced certifications in Azure (e.g., Azure DevOps Engineer Expert, Azure Solutions Architect).
- Extensive experience with high-traffic e-commerce applications and a track record of ensuring uptime and resilience.
- Experience with other monitoring and observability tools (e.g., Grafana, Prometheus) is a plus.
- Onsite work environment, fostering collaboration and relationship building with peers, cross-functional partners and leadership.
- The stability and resources of an industry-leading company successfully operating for 25 years, with the agility and innovation of a startup, allowing you to make a significant impact and shape our future .
- Growth & Development – Each team member has a visible and immediate impact on the business, offering abundant opportunities for personal and professional growth as we scale in size and sophistication.
- Healthcare plans that include health/dental/vision insurance, 401K Plan, company-paid life insurance and short-term disability, flexible spending account options and more.
- Paid time off, including sick days & holidays to support work-life balance.
Site Reliability Engineer (SRE)
Who are we, and what do we do?
Edible Brands® is an innovative, Atlanta-based company that acquires, develops, and manages a world-class portfolio of consumer and service brands. From our flagship brand, Edible Arrangements®, to the diverse portfolio under our umbrella—including Rōti Modern Mediterranean®, edible.com®, edibles.com™, freshfruit.com™, and BerryDirect®— we’ve revolutionized the way people experience food.
By combining exceptional products with cutting-edge e-commerce platforms, proprietary software, and a robust supply chain, we create memorable and accessible experiences for customers across the globe. As a dynamic, forward-thinking company, we are constantly evolving to deliver high-quality, innovative solutions that resonate with our customers and franchisees worldwide.
Location: This is an onsite role based at our Corporate Office in Sandy Springs, GA, with a Monday–Friday schedule.
Purpose:
As a Site Reliability Engineer (SRE), you will be responsible for ensuring the resilience and reliability of our e-commerce applications through monitoring, automation, and proactive site maintenance. You will leverage Datadog, Azure Application Insights, and other industry-standard tools to develop robust monitoring systems that enhance site awareness, detect and respond to incidents, and maintain high availability. You will also drive collaboration across engineering teams to build a proactive approach to system health, site reliability, and incident management.
Responsibilities:
Requirements:
Essential Competencies:
What will set you apart:
What We Offer:
We are proud to be an EEO/AA employer. Applicants for employment are considered without regard to race, creed, color, religion, sex, sexual orientation, marital status, national origin, age, and disability, status as a veteran, Vietnam Era Veteran, or being a member of the Reserves or National Guard.
About the company
Notice
Talentify is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status.
Talentify provides reasonable accommodations to qualified applicants with disabilities, including disabled veterans. Request assistance at accessibility@talentify.io or 407-000-0000.
Federal law requires every new hire to complete Form I-9 and present proof of identity and U.S. work eligibility.
An Automated Employment Decision Tool (AEDT) will score your job-related skills and responses. Bias-audit & data-use details: www.talentify.io/bias-audit-report. NYC applicants may request an alternative process or accommodation at aedt@talentify.io or 407-000-0000.