AI with Open and Scaled Data Sharing in Semiconductor Manufacturing

Robust data sharing in a collaborative data ecosystem (CDE) scales qualified data and widens access to untapped operational advantages for manufacturers.

TAKEAWAYS:
● Smart Manufacturing leverages large volumes of industry-qualified data to orchestrate applications comprehensively at multiple operational scales, but data access remains a barrier.
● Data sharing combined with Data-first site strategies recognize the need to first process raw operational data into AI-ready data for any AI, machine learning, or digital twin application to work.
● Manufacturers and engaged factory staff can agree and execute on cross-site data processes, guardrails, and shared workforce training for qualified, scalable, and trusted data sharing.

Smart Manufacturing (SM) defines the orchestration of advanced digital technologies used to construct scaled software systems. Data are used in artificial intelligence (AI), machine learning (ML), and digital twin (DT) applications to enable data-driven insights and decision-making, automation/autonomy, and scaled interoperability within and across physical and human control and management systems. This results in improved products, reduced energy and material usage, and enhanced productivity, responsiveness, and resilience in manufacturing operations, enterprises, and supply chains. The workforce becomes more effective, productive, and engaged.

Economic opportunities and barriers with data sharing have been explained in studies conducted since 2020.³ The potential for substantial operational value is significant (Table 1), but data access remains a barrier. Processing of manufacturing data is often not prioritized, and when it is, it’s rarely done well or consistently across applications. It remains largely closed for application, tool, and training development. Like mined minerals, raw operational data hold little value in manufacturing until the data are qualified, refined, concentrated, and processed in sufficient quantities.

Smart manufacturing that orchestrates and scales AI/ML/DT applications leverages large volumes of factory data to create AI-ready data, which are consistently and persistently contextualized, qualified, prepared, and engineered for various applications at multiple operational scales. Consistency in data processing is a key objective. A Data-first strategy emphasizes the need to convert raw operational data into AI-Ready data for any application to be effective. All manufacturers—small, medium, and large—have valuable data and contribute to a broader manufacturing ecosystem. We refer to collaborative factory/company sites that share data as a collaborative data ecosystem (CDE).

Table 1: Industry-Defined Points of Economic Value for Smart Manufacturing Collaborative Data Ecosystems (CDEs) that can Scale Data and AI

A Workshop to Benchmark the Value of Data Sharing

A workshop sponsored by the National Science Foundation (NSF) and supported by the National Institute of Standards and Technology (NIST) titled “Artificial Intelligence with Open and Scaled Data Sharing in the Semiconductor Industry,” aimed to benchmark the potential of scaled data sharing while addressing significant barriers. It brought together 32 factory engineers and data scientists from 12 semiconductor manufacturers. Additionally, 27 participants, including data scientists from academic institutions across the country, industry experts on information technology (IT) and operational technology (OT) infrastructure, specialists in price analysis and equipment building, and government leaders in advanced manufacturing contributed by challenging, proposing, and reviewing paths forward.

The workshop focused on existing technologies (no R&D) and benchmarking near-term benefits. A Seagate/UCLA project team benchmarked the economic value points related to the data processing and engineering necessary to build a virtual metrology application for enhancing productivity. Wafer production datasets from five etch machines at different sites, used for similar operations across different products, were qualified, categorized, prepared, and engineered into AI-ready data. A common data information model was developed for all five machine tools using the CESMII SM Profile^TM4 to encode the data model in a digitally standard form. Data information modeling was also demonstrated on chemical mechanical planarization (CMP) machines at three company sites.

Executing on Consistent Data Processing as a CDE

Executing as a CDE required a commitment to a governance structure that ensured trust in site qualifications, consistent data processes, security, IP protections, and model validations. Factories and companies needed to collaborate on data preparation and build AI/ML models while maintaining autonomy over their products and applications. Factory site data engineers and scientists had to work together on solutions. Governance was supported by a “mindset” that challenged conventional thinking. Adhering to eight execution principles was critical for sustaining the ecosystem effort (Table 2).

Table 2: Eight Key Execution Principles for Industry Data Sharing

Business Value Basis for the Ecosystem to Form

This coalition established the CDE as a “market-driven, business entity.” This study demonstrates a CDE that is a bottom-up, business-focused entity for factories to increase the value of site data in collaboration with other factories and companies. It creates new business, revenue, and service opportunities based on data value and the benefits of jointly preparing data and building models. Interest in the CDE began with a viable business opportunity. Identifying specific operational benefits was the crucial next step. The execution principles propelled the CDE forward.

An Overall Finding about Data Processing Consistency

We highlight the key finding that data preparation and refinement consistency are best achieved through a workflow of repeatable data processing steps, which include (counterclockwise): (1) eliminating contextual and formatting inconsistencies with a common data information model as a collaborative step; (2) ensuring consistent qualification (operational quality) and formatting (including categorization of key operational features) as on-site steps; (3) maximizing pooled data processing through a workflow of collaborative steps; and (4) site validation and deployment with shared but individually applied solutions and methods (Figure 1).

Maximizing collaboration (shown in blue) while minimizing inconsistencies from site steps (in black) was essential for data processing consistency. The figure also emphasizes that consistency involves consistently selecting and applying methods for each step. Entry into the cycle is the common data information model.

Figure 1. Consistent and Collaborative Data Refinement and Model Building

Key Benchmark Performance Findings

SM and AI/ML/DT systems can be implemented in a cost-effective manner. The CDE process was benchmarked against processing data and building the ML model independently for each machine:

Batch run datasets from different sites were combined to create a qualified, consistently processed, and richer 100,000 batch run super dataset.
ML model performance using aggregated data for predicting wafer flatness pass or fail was 30 percent to 50 percent better than performance with siloed training.

Pooled processing could achieve 3x cost savings in staffing and avoid 4 full-time equivalents (FTEs) in increased headcount across all sites.

Factory-floor staff from various sites collaborated to create a common data information model for all machines, reflecting a shared expert understanding of machine operation. Building the common data information model facilitated co-developed methods to consistently qualify data, protect IP, categorize data, address security, and share training. Workforce training should ultimately be guided by the business value of data sharing. However, initial on-the-job training programs on data processing are needed to drive the value of data for AI/ML/DT applications.

Every success in this demonstration project was driven by the value and availability of consistently processed data. Focusing on shared data processing and engineering facilitated algorithm development and validation. Better data could be produced without increasing headcount or service requirements by pooling factory site data. If data are qualified, consistent, scalable, and trusted, cross-operational advantages follow. Cross-site, cross-factory, and cross-company data sharing is doable. A sufficient intersection of individual values and ways to address risks and barriers can be found. There is a line of sight to shared data inventories categorized with process conditions. M

About the Authors

Sthitie Bom is Vice President of Factory Data, Analytics, and Applications at Seagate.

Jim Davis is UCLA Vice Provost IT (CIO/CTO) Emeritus.

Notes:

1. The content in this article is based upon work supported by the National Science Foundation (NSF) under Grant 2334590 and further supported by the National Institute of Standards and Technology (NIST). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NSF or NIST.

2. The detailed NSF sponsored/NIST supported Workshop report is in process to be published; Workshop Organizing Committee: Sthitie Bom, Seagate Technology (co-chair); Jim Davis, UCLA (co-chair); Said Jahanmir, Office of Advanced Manufacturing, NIST; Bruce Kramer, Office of Advanced Manufacturing, NIST; Don Ufford, Office of Advanced Manufacturing (when work was done), NIST; Greg Vogl, Engineering Laboratory, NIST.

3. Towards Resilient Manufacturing Ecosystems through Artificial Intelligence – Symposium Report, NIST Advanced Manufacturing Series, NIST AMS 100-47, September 2022; Options for a National Plan for Smart Manufacturing; National Academies of Science, Engineering and Medicine, Consensus Study Report, 2024.

4. CESMII (Collaborative Ecosystems for Smart Manufacturing Innovation Institute) sponsored by DOE; See https://www.cesmii.org/technology/sm-profiles/ for further information on the CESMII SM Profiles and the associated SM interoperability platform.