We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

2026 Summer Intern - Regev Lab - Bayesian Optimization with LLMs

Genentech
United States, California, South San Francisco
Feb 04, 2026
The Position

2026 Summer Intern - Regev Lab - Bayesian Optimization with LLMs

Department Summary

Many real world optimization problems, such as the design of experiments in biological or chemical domains, the tuning of hyperparameters in machine learning systems, and the allocation of resources under uncertainty, are both expensive and high dimensional. Traditional algorithms for such black box or bandit optimization rely primarily on carefully chosen surrogate models, including Gaussian Processes, random forests, or Bayesian neural networks, to guide the search. While these methods provide a foundation for uncertainty quantification, they often struggle to incorporate the vast qualitative insights or latent domain knowledge available through modern generative models like LLMs.

The project aims to develop a unified framework that integrates probabilistic search with LLM suggestions, maintaining control over the optimization landscape while leveraging external information cues. Research will address the fundamental challenge of balancing data driven discovery with potentially noisy or heuristic insights through a principled synthesis of robust surrogates and agentic reasoning.

This internship position is located in South San Francisco, on-site.

The Opportunity

A central component of the internship involves establishing formal performance guarantees and convergence properties for the proposed methodology. By demonstrating that the framework maintains reliable behavior even when incorporating non-traditional suggestions, the project ensures the method scales effectively as experimental data accumulates. The methodology will be validated on high throughput functional genomics data where efficient search is critical due to the scale and cost of physical experiments. By establishing a robust loop that integrates generative insights with experimental results, the research aims to develop new methods for automated discovery suitable for submission to machine learning or computational biology venues.

Program Highlights

  • Intensive 12-weeks, full-time (40 hours per week) paid internship.

  • Program start dates are in May/June 2026.

  • A stipend, based on location, will be provided to help alleviate costs associated with the internship.

  • Ownership of challenging and impactful business-critical projects.

  • Work with some of the most talented people in the biotechnology industry.

Who You Are

Required Education:

  • Must be pursuing a Master's Degree (enrolled student).

  • Must be pursuing a PhD (enrolled student).

Required Majors: Computer Science, Electrical Engineering, Machine Learning, Artificial Intelligence, Computational Biology, or a closely related field.

Required Skills:

  • Advanced Python, ML Frameworks, and Experimental Management: Proficiency in Python and experience managing high-dimensional datasets and computationally intensive optimization loops using cluster or HPC job schedulers (e.g., SLURM) and multi-GPU environments; experience with optimization libraries is preferred (e.g., PyTorch, GPyTorch, BoTorch).

  • Bayesian Optimization and Probabilistic Modeling: A strong background in Gaussian Processes, acquisition functions, and the theoretical underpinnings of Bayesian optimization, including familiarity with regret-based analysis and convergence proofs.

  • LLM Integration and Agentic Workflows: Experience implementing and evaluating LLM-based agents, specifically for structured knowledge retrieval or hypothesis generation in scientific domains.

  • Computational Biology and Genomic Data (Optional): Data handling skills relevant to high-throughput screens, including the analysis of single-cell RNA-seq (Perturb-seq) or related transcriptomic data to evaluate model performance.

  • Research Rigor and Documentation: Proven ability to read and implement complex mathematical and biological research papers, design reproducible experimental pipelines, and document theoretical derivations clearly.

  • Scientific Communication: Strong skills in synthesizing complex results at the intersection of machine learning and biology for presentation to cross-functional research teams.

Preferred Knowledge, Skills, and Qualifications

  • Excellent communication, collaboration, and interpersonal skills.

  • Complements our culture and the standards that guide our daily behavior & decisions: Integrity, Courage, and Passion.

Relocation benefits are not available for this job posting.

The expected salary range for this position based on the primary location ofCalifornia is $50.00 hourly. Actual pay will be determined based on experience, qualifications, geographic location, and other job-related factors permitted by law. This position also qualifies for paid holiday time off benefits.

Genentech is an equal opportunity employer. It is our policy and practice to employ, promote, and otherwise treat any and all employees and applicants on the basis of merit, qualifications, and competence. The company's policy prohibits unlawful discrimination, including but not limited to, discrimination on the basis of Protected Veteran status, individuals with disabilities status, and consistent with all federal, state, or local laws.

If you have a disability and need an accommodation in relation to the online application process, please contact us by completing this form Accommodations for Applicants.

Applied = 0

(web-54bd5f4dd9-dz8tw)