Search for question
Question

Background: Machine learned interatomic potentials are essential in molecular dynamics simulations for semiconductor materials. They can provide high level of accuracy in energies, forces and stresses needed for phonon properties such

as thermal conductivity, which is one of important characteristics in semiconductor devices used in high temperature conditions. In the training outputs, two sets of data (training/test) in sheet1 and sheet2 respectively, are used to find out the best potential parameters. In sheet3, three factors (X1, X2, X3) that affecting the energy (root mean square error) RMSE are provided together with one example of energy RMSE. Data: The data is from a machined learned potential for gallium oxide - a compound semiconductor material. Each set of data contains root mean square error (RMSE) for energy, force, and stress. There are four columns in each set of data, namely training/test step number, energy RMSE, force RMSE, and stress RMSE. Each column contains 50 values. Note: In Data Sheet1, Sheet2, Sheet3, replace the letters A - F with the last six digits of your student number, for example, if your student is 1234567, then A = 2, B = 3, C = 4, D = 5, E = 6, and F =7 Alternatively, if your student number is 123456, then A = 1, B = 2, C = 3, D = 4, E = 5, and F = 6./nObjectives: You are required to write a project report (template available on Canvas) that carries out a detailed statistical investigation on the experimental data discussed above. The project should provide an answer as to what are minimum values for both training and test RMSE for three parameters (energy, force, and stress). When writing the report for the project, structure it in a way that allows you to cover and address all the following questions and issues in a way that reveals a progressively more significant understanding of the two data sets, for example: 1. For both training and test data, identify extreme outliers for energy, force, and stress, and comment the impact on the mean, median and mode values after removing them. Using the training/test data sets in sheet1 and sheet2, especially for the energy column, construct an appropriate parametric and/or non-parametric test to assess the difference between training and test data. Discuss the differences from the analysis above according to parametric and/or non-parametric test theory. 2. Using the data set in sheet3 and the technique of multiple least squares, estimate the B parameters of the following second-order response surface model: 2 Y = β + Σβx + Σβx + Σ Σ βιXX + E 1-1 where Y is the energy RMSE, X, is the factor1, X, is the factor2, and X, is the factor3. & is the prediction error or residual. When writing up your analysis of this model, describe how well this model fits the 1 data, which variables are statistically significant (important) and what meanings can be attached to the B parameters. State any assumptions that need to be made in assessing such statistical significance, and if appropriate carry out tests or construct scatter plots to validate these assumptions. 3. Derive a simplified version of the above model that includes only the statistically significant variables. When writing up your analysis, describe how well this simplified model fits the data, the meaning of the parameters, the degree of accuracy (as described by a 95-confidence interval on the actual vs prediction plot). Make full use of any suitable 2D or 3D scatter plots when writing your final report./nIndividual project: Summary of briefing • Issue date: 13/11/2023. • Submission deadline: by 17:30PM on 15/12/2023 • Individual project to be submitted via Canvas submission portal. Zero tolerance for late submission ditto, for plagiarism and collusion- all parties involved will be given zero marks at least and may lead to further grave consequences. • Format: pdf file. Maximum of 8 pages (excluding the appendix and cover page). • Abstract: Not more than 150 words. • Introduction: Not more than 250 words. • Appendix page: not more than two pages (single column). • No screenshot of codes in the main report. • Concisely format all code in the appendix page. • Format all figures using the appropriate techniques (refer to experimental sessions), add suitable titles, X/Y labels, legends to all figures. For tables, clearly title columns and give the units of the measurement. • The overall report should be concise, complete, informative and written in a precise scientific tone highlighting clear objectives, methodology, a detailed discussion of main findings and a comprehensive reflection. All presented figures should be thoroughly and insightfully analysed using statistical concepts and themes. Presented results should logically address the research problem. Distribution of Marks: Abstract →→ 5% . Introduction →→ 10% . . Results and Discussion →→ 60% Reflection 15 % General Presentation →→ 10%

Fig: 1

Fig: 2

Fig: 3