Applied Scientist, AGI Info
Amazon is seeking an exceptional Senior Applied Scientist to join AGI Info Content team. In this role, you will be at the forefront of developing and enhancing the intelligence of AmazonBot crawler and content processing. The team is a key enabler of Amazon's AGI initiatives such as data pipelines for Olympus model training and collecting data for AGI Info grounding services. Our systems operate on web scale. This requires great combination of innovation to utilize all SOTA ML techniques in combination with model optimization to operate on 100k+ requests/decision per second. Your work will directly impact the quality and efficiency of our data acquisition efforts, ultimately benefiting millions of customers worldwide.Key job responsibilities- Design, develop, and implement advanced algorithms and machine learning models to improve the intelligence and effectiveness of our web crawler and content processing pipelines.- Collaborate with cross-functional teams to identify and prioritize crawling targets, ensuring alignment with business objectives- Analyze and optimize crawling strategies to maximize coverage, freshness, and quality of acquired data while minimizing operational costs as well as dive deep into data to select the highest quality data for LLM model training and grounding.- Conduct in-depth research to stay at the forefront of web acquisition and processing. - Develop and maintain scalable, fault-tolerant systems to handle the vast scale of Amazon's web crawling operations- Monitor and analyze performance metrics, identifying opportunities for improvement and implementing data-driven optimizations- Mentor and guide junior team members, fostering a culture of innovation and continuous learningBASIC QUALIFICATIONS- 3+ years of building models for business application experience- PhD, or Master's degree and 4+ years of CS, CE, ML or related field experience- Experience programming in Java, C++, Python or related language- Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing ...