Systems Development Eng (AWS Generative AI & ML Servers), AWS Hardware Engineering Accelerators

Are you a Systems Thinker? Do you want to build the backbone of Generative AI cloud at AWS? Do you want to build the future of the cloud for AI training and inference? Want to do industry leading work delivering continuous price performance improvements in the cloud for AI model training for multi billion variable LLMs? Come Join us in designing, delivering and operating AWS cloud offerings that enable high performance and scalability in AI/ML and HPC workloads.You are intrigued by the continuous release of newer AWS services and instance types that solve newer, bigger and more interesting business problems every day? Does that make you wish your talents were applied to those at cloud scale? If yes, then come join us - we are looking for builders like you. The AWS Hardware Engineering team creates server designs for Amazon’s innovative web services. Our designs are industry-leading in frugality and operational excellence, and are critical to the success of the AWS business and millions of customers who use AWS today. Our engineers solve challenging technology problems, and build architecturally sound, high-quality components to enable AWS to realize critical business strategies. The ideal candidate for this role will be an innovative self-starter. You are knowledgeable of the full technical stack - vertically from baremetal server hardware up to the software in userland, and everything in the middle. You have tremendous interest in cloud scale and curious how systems and software decisions impact the user. You insist on highest-standards and are able to develop tactical solutions/tools to diagnose and fix issues. You are an excellent systems debugger - finding interaction issues between components on server systems. You are a leader with strong organizational, planning, and communication skills. You are a builder! What you will do?You will work with engineers across the company for delivering the next-generation AWS platforms. You will have a direct impact on our bottom line and the ability to deliver improvements for AWS. You will be part of a growing, fast paced, and fun team. You will have ownership for the implementation of your work. You will see direct product improvements based on the results of your work.AWS Engineers are shaping the way people use computers and designing the future of cloud computing technology – come help us make history!Why it matters?Public cloud IT services represent the majority of growth in the overall IT services market and will continue to do so for several years to come. The scale of AWS, combined with an understanding of how our software and hardware is used, creates a unique opportunity for component customizations that will directly benefit our customers. Why you will love it?You will work with engineers across the company for delivering the next-generation AWS platforms. You will have a direct impact on our bottom line and the ability to deliver improvements for AWS. You will be part of a growing, fast paced, and fun team. You will have ownership for the implementation of your work. You will see direct product improvements based on the results of your work.AWS Infrastructure Services (AIS) owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we’re the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we’re looking for talented people who want to help. You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.Key job responsibilitiesYou will be a technical leader solving complex architectural problems which may not defined before hand. You will be owning the teams systems and work proactively in identifying deficiencies, writing tactical code to solve issues before they impact customers, and working with your team to scale the solution. You will decompose big difficult server system testability, reliability and diagnosis problems into straightforward tasks, components or features that you will lead to deliver yourself and through others in parallel. You will use combination of hardware, software, system designs, x86 architecture, processes, diagnosis and operations knowledge. A day in the lifeWorking with a variety of job roles (SDEs, SDETs, Hardware Engineers, TPMs, Managers, Principals) and groups (AWS Hardware Engineering, EC2, other AWS services) through server conception, test, launch, and operations. Driving high quality and reliability into future/new designs for AWS Accelerated server solutions for AWS Cloud. About the teamWe are the builders of AWS backbone (server) infrastructure. The team gets involved in early NPI phase for Accelerators for AWS cloud servers, and continues through launch and thereafter. Our software, services and automation are critical in design, EC2 instance build yields, launch go/no-go decisions, and fleet health thereafter. We build process automation systems to improve efficiency across the board. About AWSDiverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS?Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. BASIC QUALIFICATIONS- 2+ years of non-internship professional software development experience- 1+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience- 7+ years of administrative experience in networking, storage systems, operating systems and hands-on systems engineering experience- Knowledge of systems engineering fundamentals (networking, storage, operating systems)- Experience programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby- Experience with modern technology devices in storage, network, memory as well as a variety of interface standards and protocols (I2C, IPMI, SPI, PCIe)- Linux kernel and user-space drivers for PCIe and external devices- Experience with x86 architecture, as well as ARM, and GPU/ FPGA devices- Strong focus on reliability, scale and diagnostics (including developing tactical and strategic tools using Python, Go, C/C++ or any other suitable high level language).- Passion for delivering tactical and strategic solutions to drive server build tempo and quality.- 5+ years or more in software development, systems development, SRE (Site Reliability Engineering), or Resilience Engineering- 5+ years of SysDE (Systems Development Engineer) or equivalent experience- 5+ years of server systems debug experience; debugging and root causing complex server platforms- 5+ years of experience contributing towards increasing durability, security, availability and scalability of systems through exploration, diagnosis and remediation- System thinking - ability to diagnose interactions between discrete components of server system and drive product improvements.- A strong understanding of OS internals, including network and storage subsystems ...

Technical Account Manager, ES - SI - NICE

Sales, Marketing and Global Services (SMGS)AWS Sales, Marketing, and Global Services (SMGS) is responsible for driving revenue, adoption, and growth from the largest and fastest growing small- and mid-market accounts to enterprise-level customers including public sector. The AWS Global Support team interacts with leading companies and believes that world-class support is critical to customer success. AWS Support also partners with a global list of customers that are building mission-critical applications on top of AWS services.Would you like to join one of the fastest-growing organizations within Amazon Web Services (AWS) and help customers of all industries and sizes gain the best value and service from AWS? AWS Enterprise Support, Technical Account Managers (TAM) support our customers’ creative and transformative spirit of innovation across all technologies, including Compute, Storage, Database, Big Data, Application-level Services, Networking, Serverless, Deployment, Security and more. This is not a sales role, but rather an opportunity to be the principal technical advisor and ‘voice of the customer’ to organizations ranging from start-ups to Fortune 500 enterprises.The TAM role is not directly hands on keyboard within the customer’s environment for troubleshooting customer support issues, rather you will work with appropriate engineers and service teams to see issues through to resolution. More importantly you will work proactively to help craft and execute strategies to drive our customers' adoption and use of AWS services, including EC2, S3, DynamoDB & RDS databases, Lambda, CloudFront CDN, IoT, and many more.Your technical acumen and customer-facing skills will enable you to effectively represent AWS within a customer’s environment, and drive discussions with senior leadership regarding incidents, trade-offs, support and risk management.You will provide advocacy and strategic technical guidance to help plan and build solutions using best practices, and proactively keep your customers’ AWS environments operationally healthy and resilient.The close relationships developed with your customers will allow you to understand their business/operational needs and technical challenges, and help them achieve the greatest value from AWS. This position will require the ability to travel 10% or more as needed.The TAM is the centerpiece of value to our Enterprise Support customers. If you wish to be at the forefront of innovation, come join us!Key job responsibilitiesAs a TAM, you will craft and execute technical cloud strategies to drive customers adoption and use of AWS services. Your technical acumen and customer-facing skills will enable you to effectively represent AWS at our customer, and drive discussions with senior leadership regarding operational excellence, cloud maturity, support, and risk management.You will provide advocacy and strategic technical guidance to plan and build solutions using best practices, and proactively keep the customers AWS environments operationally healthy. The close relationships developed with your customers will allow you to understand their business/operational needs and technical challenges, and help them achieve the greatest value from AWS. This position will require the ability to travel 10% or more as needed.The TAM is the centerpiece of value to our Enterprise Support customers. If you wish to be at the forefront of innovation, come join us!A day in the lifehttps://www.youtube.com/watch?v=l48WpOmM5j4About the teamAbout AWSDiverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.Why AWS?Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.BASIC QUALIFICATIONS- Bachelor’s Degree in Computer Science, Math, or related discipline, and 2+ years of equivalent work experience or 4+ years of related work experience.- 2+ years of technical engineering experience- Experience in Informational Technology operations ...

Software Development Engineer SDE , Datacenter Networks

AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we’re the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we’re looking for talented people who want to help. You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.We are building a radically new type of datacenter network fabrics. We expect them to power all new AWS datacenters globally, and to set the industry standard for the next decade. Our team is building software systems to dynamically generate network topologies on-demand, to implement a novel routing algorithm, and to monitor the health of datacenter fabrics. Your software will enable our internal customers to visualize always-changing network paths, automate network remediation and deployment, and improve performance for customers. You will join a group that sets the direction of their product and iterates fast to continuously improve it and delight customers. Our group ships positivity within the team to create a happy and inclusive work environment, and it values self investment as a core part of their success.Key job responsibilitiesA Software Developer Engineer within the datacenter networking team at AWS owns the design and development of key components of Amazon’s next-generation datacenter fabric. This includes building tools to provide network visibility to a wide range of internal operators, and systems to automate wiring and incremental expansion of DC networks, globally. You will collaborate with Network Engineering/DC Operations teams, and with a wide range of teams operating provisioning, configuration, deployment, monitoring and scaling tools.Develop world-class software systems for automating Amazon's network.Provide technical direction to the team and identify areas of focus.Create and review software design documentation Collaborate with Principal Engineers and Scholars to ensure fast, smooth roll-out of new designs and products.Own the operational excellence of the software you put into production.Contribute to improving our documentation, processes, and tools so that we improve our performance as a team.About the teamWhy AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Diverse ExperiencesAmazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Mentorship and Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.Working at AWS in the Core Networking Team • Meet Matt, Director, Core Networking --- https://youtu.be/DqTStjRtjX4 BASIC QUALIFICATIONS- 3+ years of non-internship professional software development experience- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience- Experience programming with at least one software programming language ...

Software Dev Engineer II, Network Platform Development (NetOS)

AWS is looking for Software Development Engineers (SDE) to help build and maintain new Services and solutions that powers the world's largest Cloud Network. Our customers demand the highest quality, reliability and security for their services. As we expand at a tremendous rate, we are seeking brilliant and passionate SDEs to join our NetOS Fleet Health software development team.You will be responsible for designing and developing software solutions for monitoring systems that need to operate at an extreme scale and with the highest reliably and security standards. As a software technical leader of this team, you will dive deep to understand Amazon's network and service architecture, it's operation and security. And you will partner with network engineering, software and hardware team members and other AWS service teams to develop the software for our services. This team focuses on developing software that allows for monitoring our networking devices. We are looking for the highest quality candidates for a world class team.Why would you want to work in AWS Networking?- We are making history!- We have some of the largest data center networks in the world and we keep growing.- Because we own both the network and the devices, we can innovate in a way that others cannot.- We have a very large impact: everything AWS does is built on networks using these devices.This is the SDE2 role in Networking Platform Development (Fleet Health team) in Cupertino, CA. Work will involve software and service infrastructure development on Native AWS services.AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we’re the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we’re looking for talented people who want to help. You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.About the teamWhy AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Diverse ExperiencesAmazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Mentorship and Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.Working at AWS in the Core Networking Team • Meet Matt, Director, Core Networking --- https://youtu.be/DqTStjRtjX4 BASIC QUALIFICATIONS- 3+ years of non-internship professional software development experience- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience- Experience programming with at least one software programming language ...

Software Development Manager, AWS Neuron Machine Learning Distributed Training - Model Enablement

AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machinelearning accelerators and the Trn1 and Inf1 servers that use them. As the SDM of Software Development for the Machine Learning Distributed Training team, you will be responsible for leading a strong team of engineers and managers to help design and deploy these new products. A successful candidate will have an established background in developing Machine Learning products with direct customer-facing experience, a strong technical ability and a motivation to achieve results. Experience in Machine Learning and software development is also a must.Responsible for the full development life cycle of our integrations and extensions for inference and training support in Pytorch, XLA, JAX as well as distributed training libraries like FSDP, DDP and others. Includes enabling models using MoE architectures and future newer architectures.Lead the way to ensure support for key ML functionality in a combined chip / software platformEnsure the right thing is being built and delivered to customersKey job responsibilitiesOur engineers and managers collaborate across diverse teams, projects, and environments to have a firsthand impact on our global customer base. You’ll bring a passion for innovation, data, search, analytics, and distributed systems. You’ll also: Solve challenging technical problems, often ones not solved before, at every layer of the stack. Design, implement, test, deploy and maintain innovative software solutions to transform service performance, durability, cost, and security.Build high-quality, highly available, always-on products.Research implementations that deliver the best possible experiences for customers.A day in the lifeYou will work with the executive leadership and other senior management and technical leaders to define product directions and deliver them to customers. We build massive-scale distributed training and inference solutions. This organization builds the full stack of software, servers and chips to accelerate at the highest scale.About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Hybrid WorkWe value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our US Amazon offices. Our hybrid models allow you the freedom to work from home whenever in-office collaboration isn’t necessary.BASIC QUALIFICATIONS- 3+ years of engineering team management experience- 7+ years of working directly within engineering teams experience- 3+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience- 8+ years of leading the definition and development of multi tier web services experience- Experience partnering with product or program management teams- Knowledge of engineering practices and patterns for the full software/hardware/networks development life cycle, including coding standards, code reviews, source control management, build processes, testing, certification, and livesite operations- 3+ Years of Deep Learning/Machine learning experience ...

Manager, Optical Network Development, NPD Interconnect

Do you like to design, operate, and implement networks of large scale? Would you like to play a key role to support all aspects of connectivity to/from Amazon and the outside world, as well as the connectivity between Amazon’s data centers and services to design and architect innovation so networks are fail-proof, infinitely scale and grow?AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we’re the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we’re looking for talented people who want to help. You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.The Core Networking team is looking for an Optical Network Development Manager to join our Network Product Development (NPD) team. As an Optical Network Development Manager, you will be responsible for building, deploying and scaling the Amazon networks that support AWS, customers, and other business units, across multiple global data centers. AWS Core Networking is focused on building Data Centers and the network that allows Data Centers to function efficiently. They own the solutions that allow racks to be aggregated and Data Centers to be interconnected. Core Networking’s goal is to balance efficiency, performance and reliability to allow customers access to their applications and data.Key job responsibilitiesAs an Optical Network Development Manager, you will be responsible for building and managing a team of highly skilled Optical Network Development and Software Engineers, who are responsible for end-to-end ownership of defining, designing and developing the optical solutions for our Data Center Network. At Amazon, we ask for our technical managers to be talented engineers who can dive deep and get involved in addition to being leaders. Responsibilities:• Roadmap and Execution – As the Optical Network Development Manager in the Network Product Development team you will be integral in understanding optical technology trends and our customer requirements to drive the development roadmap while working with our partners to deliver optical platforms (Software and Hardware) that meet unique requirements. You will be responsible for developing and improving project execution. This includes creating process, procedures and automation to improve efficiency in our day-to-day tasks and projects. You will work closely on supporting our internal customers and ensuring that their needs and issues are being addressed. • Operations – As an Optical Network Development Manager you will be expected to drive quality of our optical solutions in our network. This includes defining metrics that assist us in focusing on areas that help in delivering operationally friendly products and work with cross functional teams in developing monitoring solutions for optical platforms.• Performance Management/Team Health – You will own all facets of performance and career management for the team. Regular one-on-one meetings with all team members are required. You will be expected to provide both technical and ‘soft skill’ mentoring in order to maintain a well-rounded, world-class organization. This includes project management, quality audits and coordination of training sessions with senior-level engineers.• Recruiting and Hiring – You will take the lead in hiring quality personnel who not only fit the needs of the current organization but also will allow the team to scale with platform and service growth. You will coordinate with Amazon and external recruiting staff to evaluate potential candidates, participate in initial phone screens and provide relevant guidance and feedback during onsite interview loops. You will also be responsible for ensuring that proper training takes place for all new hires. A day in the lifeOn an everyday basis as part of our team, you have the unique opportunity to understand the growing AWS network and our internal customers’ requirement on interconnect solutions. You'll work backwards to devise hardware solutions by influencing the broad industry and/or to develop software tools with sister teams to maintain a highly available network that delights AWS customers. You design and implement processes and mechanisms that both help the team to deliver business impact to the organization in a systemic way, while also helping to raise the bar on our operational excellence. Operating at the scale we do, there is no blueprint for how to do what we do, which encourages our engineers to identify and develop simple solutions to complex problems. We encourage durable solutions that look around corners while taking into consideration our customer needs from a cost, performance, and reliability perspective. We work closely with our internal partners that design, build and operate the network to ensure that our solutions meet their needs and exceed their expectations.About the teamWithin AWS Networking the NPD (Network Product Development) organization is responsible for, designing the hardware, building the software, and owning the interconnects for the routers that power the global AWS network. Beyond product delivery we actively manage the fleet or routers in a network that grows by 70% annually. This means tracking key business and operational metrics to ensure that we operate smoothly and minimize or eliminate customer impact due to device related issues for a transparent AWS customer experience.Working at AWS in the Core Networking Team - Meet Matt, Director, Core Networking -- Link BASIC QUALIFICATIONS- • Master’s Degree in Electrical Engineer or related field.- • 8+ years of industry experience in delivering optical solutions.- • 5+ years of experience building optical transceivers for data center networks.- • 4+ years of management experience including hiring/retention/development.- • 2+ years of experience delivering Optical Network solutions to customers. ...

Senior SoC Functional Modeling Engineer, AWS Machine Learning Accelerators

Custom SoCs (System on Chips) are the brains behind AWS’s Machine Learning servers. Our team builds C++ & SystemC functional models of these custom-designed accelerator SoCs for use by AWS internal teams. We’re looking for a Senior SoC Modeling Engineer to join the team and deliver new functional models, infrastructure, and tooling for our customers.As part of the ML accelerator modeling team, you will:- Develop and own SoC functional models end-to-end, including model architecture, integration with other model or infrastructure components, testing, and debug- Work closely with architecture, RTL design, design verification, emulation, and software teams to build, debug, and deploy your models- Innovate on the tooling you provide to customers, making it easier for them to use our SoC models- Drive model and modeling infrastructure performance improvements to help our models scale- Develop software which can be maintained, improved upon, documented, tested, and reusedAnnapurna Labs, our organization within AWS, designs and deploys some of the largest custom silicon in the world, with many subsystems that must all be modeled and tested with high quality. Our SoC model is a critical piece of software used in both our SoC development process and by our partner software teams. You’ll collaborate with many internal customers who depend on your models to be effective themselves, and you'll work closely with these teams to push the boundaries of how we're using modeling to build successful products.You will thrive in this role if you:- Are an expert in functional modeling for SoCs, ASICs, TPUs, GPUs, or CPUs- Are comfortable modeling in C++ or SystemC, and familiar with Python- Enjoy learning new technologies, building software at scale, moving fast, and working closely with colleagues as part of a small team within a large organization- Want to jump into an ML-aligned role, or get deeper into the details of ML at the hardware/system-levelAlthough we are building machine learning chips, no machine learning background is needed for this role. This role spans modeling of the ML and management regions of our chips, and you’ll dip your toes into both. You’ll be able to ramp up on ML as part of this role, and any ML knowledge that’s required can be learned on-the-job.This role can be based in either Cupertino, CA or Austin, TX. The broader team is split between the two sites, with a slight preference for CA, due to colocation with more customer teams.We're changing an industry. We're searching for individuals who are ready for this challenge, who want to reach beyond what is possible today. Come join us and build the future of machine learning!A day in the lifeA few videos help explain what the Annapurna Labs ML team is working on:- https://youtu.be/4nfkonjjICo?si=nKhM1Wv4108mnIOa- https://youtu.be/n38WDflRbjQ?si=SZEV5i_5du1jKYP-About the teamAWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS- 6+ years of non-internship professional experience writing functional or performance models- Experience programming with C++ and/or SystemC- Familiarity with SoC, CPU, GPU, and/or ASIC architecture and micro-architecture ...

Senior Software Development Engineer, Annapurna Labs, Trainium Collectives, Elastic Collectives

We are seeking an experienced engineer to work on distributed AI/ML systems. This role involves working on collective operations - the fundamental operations that enable AI to scale across multiple accelerators & servers. Most of our stack is C/C++ and relatively low level, so solid knowledge of Linux, kernels, and performant code is important. Experience with embedded systems is valued, and experience with high-speed networking or HPC interconnects is valued highly.If you like solving hard problems, want to work with HPC and ML customers, iterate fast and deliver meaningful solutions at scale, then come join us! This truly is a role on the forefront of AI/ML, you’ll be working on features for the largest clusters, with the largest customers, for the largest AI models.The org you would be joining is Annapurna Labs, an integral part of AWS and develops hardware and software components that are critical building blocks for EC2 infrastructure. Every instance in EC2 is running some type of hardware designed in Annapurna Labs. We specialize in designing software, systems and chips that optimize the AWS customer experience. A day in the lifeAnnapurna Labs, a crucial part of AWS, is responsible for developing hardware and software components for EC2 infrastructure. Our team focuses on building networking solutions that for Machine Learning (ML) and High-Performance Computing (HPC) workloads on AWS.We have mixed discipline orgs, you’d be working side by side with infrastructure experts, hardware engineers, RTL engineers, scientists & architects. Our workforce spans the globe and is truly international, you’ll find yourself working side by side with individuals from numerous countries. We take mentorship seriously, you can both expect senior mentorship and will be expected to mentor new and junior engineers. The pace is fast as we work on the latest advancements of AI/ML, but we take the time to bond as a team and enjoy the successes. We offer flexibility in working hours, and respect WLB as a core org tenet. The team enjoys working with numerous principal-level engineers and closely with directors, career growth opportunities are certainly available. This is a role where you will always be encouraged to keep learning, the AI/ML field is fast moving and constantly evolving.About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- 5+ years of non-internship professional software development experience- 5+ years of programming with at least one software programming language experience- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience- Experience as a mentor, tech lead or leading an engineering team ...

Senior Software Development Engineer, Annapurna Labs, Trainium Collectives, Elastic Collectives

We are seeking an experienced engineer to work on distributed AI/ML systems. This role involves working on collective operations - the fundamental operations that enable AI to scale across multiple accelerators & servers. Most of our stack is C/C++ and relatively low level, so solid knowledge of Linux, kernels, and performant code is important. Experience with embedded systems is valued, and experience with high-speed networking or HPC interconnects is valued highly.If you like solving hard problems, want to work with HPC and ML customers, iterate fast and deliver meaningful solutions at scale, then come join us! This truly is a role on the forefront of AI/ML, you’ll be working on features for the largest clusters, with the largest customers, for the largest AI models.The org you would be joining is Annapurna Labs, an integral part of AWS and develops hardware and software components that are critical building blocks for EC2 infrastructure. Every instance in EC2 is running some type of hardware designed in Annapurna Labs. We specialize in designing software, systems and chips that optimize the AWS customer experience. A day in the lifeAnnapurna Labs, a crucial part of AWS, is responsible for developing hardware and software components for EC2 infrastructure. Our team focuses on building networking solutions that for Machine Learning (ML) and High-Performance Computing (HPC) workloads on AWS.We have mixed discipline orgs, you’d be working side by side with infrastructure experts, hardware engineers, RTL engineers, scientists & architects. Our workforce spans the globe and is truly international, you’ll find yourself working side by side with individuals from numerous countries. We take mentorship seriously, you can both expect senior mentorship and will be expected to mentor new and junior engineers. The pace is fast as we work on the latest advancements of AI/ML, but we take the time to bond as a team and enjoy the successes. We offer flexibility in working hours, and respect WLB as a core org tenet. The team enjoys working with numerous principal-level engineers and closely with directors, career growth opportunities are certainly available. This is a role where you will always be encouraged to keep learning, the AI/ML field is fast moving and constantly evolving.About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- 5+ years of non-internship professional software development experience- 5+ years of programming with at least one software programming language experience- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience- Experience as a mentor, tech lead or leading an engineering team ...

Software Development Engineer, Annapurna Labs, Trainium Collectives

We are seeking an experienced engineer to work on distributed AI/ML systems. This role involves working on collective operations - the fundamental operations that enable AI to scale across multiple accelerators & servers. Most of our stack is C/C++ and relatively low level, so solid knowledge of Linux, kernels, and performant code is important. Experience with embedded systems is valued, and experience with high-speed networking or HPC interconnects is valued highly.If you like solving hard problems, want to work with HPC and ML customers, iterate fast and deliver meaningful solutions at scale, then come join us! This truly is a role on the forefront of AI/ML, you’ll be working on features for the largest clusters, with the largest customers, for the largest AI models.The org you would be joining is Annapurna Labs, an integral part of AWS and develops hardware and software components that are critical building blocks for EC2 infrastructure. Every instance in EC2 is running some type of hardware designed in Annapurna Labs. We specialize in designing software, systems and chips that optimize the AWS customer experience. A day in the lifeAnnapurna Labs, a crucial part of AWS, is responsible for developing hardware and software components for EC2 infrastructure. Our team focuses on building networking solutions that for Machine Learning (ML) and High-Performance Computing (HPC) workloads on AWS.We have mixed discipline orgs, you’d be working side by side with infrastructure experts, hardware engineers, RTL engineers, scientists & architects. Our workforce spans the globe and is truly international, you’ll find yourself working side by side with individuals from numerous countries. We take mentorship seriously, you can both expect senior mentorship and will be expected to mentor new and junior engineers. The pace is fast as we work on the latest advancements of AI/ML, but we take the time to bond as a team and enjoy the successes. We offer flexibility in working hours, and respect WLB as a core org tenet. The team enjoys working with numerous principal-level engineers and closely with directors, career growth opportunities are certainly available. This is a role where you will always be encouraged to keep learning, the AI/ML field is fast moving and constantly evolving.About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- 3+ years of non-internship professional software development experience- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience- Experience programming with at least one software programming language ...

Sr. SoC Power Engineer, Annapurna Labs, Cloud Scale Machine Learning

Our Machine Learning Acceleration (MLA) team develops the Inferentia and Trainium SoCs that are used to power today’s AI workloads in datacenters all around the world. As a Sr. SoC Power Engineer, you’ll contribute to the project at the ground level by modeling and estimating power at every stage of the design from early RTL to final netlist and by driving ways to reduce power consumption of our machine learning accelerators.We’re searching for an experienced SoC Power engineer with a background in Power analysis with a proven track record of handling challenges at scale. In this role, you’ll be working directly with architects, designers, verification engineers, software teams, and Physical Design experts - defining best practices to reduce power and model power consumption with high accuracy.Key job responsibilities- Responsible for full chip power analysis & modelling at various stages of design (RTL to gate level netlist)- Develop and maintain dashboards for power rollups- Work with designers, architects, Verification engineers and Physical Design engineers to develop vectors for IR analysis, Thermal analysis and power estimation- Give feedback to designers and architects on how to reduce power- Make power measurements in the lab and correlate back to simulations.- Work with Emulation engineers to model chip-level power consumptionA day in the lifeDepending on the state of the project, you may find yourself working on the following:- Working with designers, architects and verification engineers to come up with worst-case vectors for power analysis or IR analysis - Root cause causes of low annotation or low switching coverage with vectors- Develop & maintain dashboards to track power consumption- Run tools like Power Artist to identify power saving opportunities and feed these back to the design team.- Work with emulation engineers to model chip level power consumption and correlate with simulation- Do post-silicon power measurements and correlate with simulation.About the teamWe are a start-up like team where no-one says "this is not my job" and you won't find anyone telling you to stay in your lane. This is a fast-paced, intellectually challenging position, and you’ll work with thought-leaders in multiple technology areas. We encourage collaboration and teamwork with multiple teams and engineers including architects, RTL designers, Verification engineers, , Physical Design engineers, Emulation engineers and software engineers.BASIC QUALIFICATIONS- BS + 6yrs or MS + 5yrs or PhD + 3yr in EE/CS- Expert on power analysis tools like PrimePower, PowerArtist or similar- Deep, circuit-level understanding of chip power - Ability to give feedback to RTL designers and Physical Designers on how to reduce power- Highly proficient with scripting language (Tcl, Perl, Python or similar)- Good understanding of Physical Design, EM/IR, Power Integrity, and Thermal at the die, package, board, and server level- Some experience with lab equipment and capable of doing lab power analysis ...

Sr. Software Development Engineer, HPC/ML Networking Engineer, Annapurna Labs

We are seeking an experienced engineer to work on distributed AI/ML systems. This role involves working on collective operations - the fundamental operations that enable AI to scale across multiple accelerators & servers. Most of our stack is C/C++ and relatively low level, so solid knowledge of Linux, kernels, and performant code is important. Experience with embedded systems is valued, and experience with high-speed networking or HPC interconnects is valued highly.If you like solving hard problems, want to work with HPC and ML customers, iterate fast and deliver meaningful solutions at scale, then come join us! This truly is a role on the forefront of AI/ML, you’ll be working on features for the largest clusters, with the largest customers, for the largest AI models.The org you would be joining is Annapurna Labs, an integral part of AWS and develops hardware and software components that are critical building blocks for EC2 infrastructure. Every instance in EC2 is running some type of hardware designed in Annapurna Labs. We specialize in designing software, systems and chips that optimize the AWS customer experience. A day in the lifeAnnapurna Labs, a crucial part of AWS, is responsible for developing hardware and software components for EC2 infrastructure. Our team focuses on building networking solutions that for Machine Learning (ML) and High-Performance Computing (HPC) workloads on AWS.We have mixed discipline orgs, you’d be working side by side with infrastructure experts, hardware engineers, RTL engineers, scientists & architects. Our workforce spans the globe and is truly international, you’ll find yourself working side by side with individuals from numerous countries. We take mentorship seriously, you can both expect senior mentorship and will be expected to mentor new and junior engineers. The pace is fast as we work on the latest advancements of AI/ML, but we take the time to bond as a team and enjoy the successes. We offer flexibility in working hours, and respect WLB as a core org tenet. The team enjoys working with numerous principal-level engineers and closely with directors, career growth opportunities are certainly available. This is a role where you will always be encouraged to keep learning, the AI/ML field is fast moving and constantly evolving.About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS- 5+ years of non-internship professional software development experience- 5+ years of programming with at least one software programming language experience- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience- Experience as a mentor, tech lead or leading an engineering team ...

We show restricted results, but there are more jobs available in our database, use Search to see them