The Jackson Hole Higher Education Group, Inc.
CyberCampus Project
Technical Document 3.3
Initialization of the Student, Faculty, Department, and Financial Databases
Table of Contents
1. Introduction
2. Macro Considerations
2.1 Steps in the Initialization Processes
2.2 Non-Uniform Random Numbers
3. Departmental Database
3.1 Departmental Master
3.2 Curriculum Requirements
3.3 Faculty Templates
3.3 Course Templates
4. Student Database
4.1 Student Numbers
4.2 Transition Probabilities
4.3 Year In Program
4.4 Gender-ethnic Group
4.5 Talent Indices
4.6 Financial Aid
4.7 Major Field
4.8 Elective Course Preferences
4.9 Courses Taken
4.10 Satisfaction Indices
5. Faculty Database
5.1 Teaching, Scholarship, and Research Talent
5.2 Principal Investigators and Average Research Project Size
5.3 Faculty Numbers
5.4 Gender-Ethnic Percentages
5.5 Average Salary
5.6 Transition Probabilities
5.7 Generating the Faculty Sims
6. Conforming the Financial Data
7. Benchmark Institutions
Higher Education is a computer-based simulation game under development that targets both the institutional professional and the interested layperson to participate in leadership challenges in a college or university setting. Players set, monitor, and modify a variety of institutional parameters and policies, allocate resources as they see fit, and watch as results continually unfold. The game provides an opportunity to experiment and succeed or fail in a safe and entertaining fantasy environment. While Higher Education is necessarily a caricature of real academic life, it is grounded in authentic data and will provide serious lessons in higher education.
The game will be driven by a sophisticated simulation engine, described in Technical Documents 2.1 through 2.5, that models enrollment management, resource allocation and finance, academic operations, physical plant activities, and performance indicators. Several more algorithms deal with the game’s initialization procedures. The present Technical Document is the third in the "initialization" series":
• Td_3.1, "The Simulation Engine: Initialization," describes the game’s institutional database and the procedures for extracting the initialization and benchmark-institution datasets;
• Td_3.2, "Student Segmentation Analysis," describes the definition of student segments;
• Td_3.3, "CyberCampus Database Initialization," provides templates for most of the required judgmental inputs and describes how they are combined with the data of Td_3.1 to initialize the game's primary operating databases:
– student database
– faculty database
– departmental database
– financial database
– benchmark-institution database
Much of the material described herein is illustrated in the Excel file "HE.GDP_init", which readers may wish to reference as they go along. The file contains six sheets. The first gives a tentative list of departments and describes the overall approach to parameterization. The second describes the templates used to elaborate faculty information, and the third describes the templates for department-level course information. Data on departmental course requirements are provided in the fourth sheet, and data on student preferences for majors and courses are provided on the sixth. The aforementioned inputs come together in the last sheet, "Faculty_Initialization."
Purple shading on the spreadsheets indicates figures for which data or judgments will have to be provided. The numbers contained in the spreadsheets are intended only to illustrate structure. We hope our advisors can help identify data sources where they exist and help make the needed judgments where no data can be found. Ideas about potential data sources are noted where applicable.
This section describes the overall flow of the initialization process and provides the algorithm for drawing non-uniform random numbers needed for a number of the steps.
Initialization will proceed according to the following steps:
Step 1. Obtain the player's desired institutional characteristics or scenario and extract the initialization dataset and benchmark dataset using the procedures described in Td_3.1. (The information in these datasets is based on actual institutions.)
Step 2. Load the departmental database (Section 4 herein) with information on the departments to be included in this play of the game.
Step 3. Generate sims for the initial student database as described in Section 5. Student sim generation requires data from the initialization dataset and the departmental database.
Step 4. Generate sims for the faculty database as described in Section 6. Faculty sim generation requires data from initialization dataset, the departmental database, and the student database.
Step 5. Conform the initial financial database to the department, student, and faculty databases.
Step 6. Generate the initial benchmark database based on all of the above.
Initialization of the student, faculty, and benchmark-institution databases will be based in part on random numbers selected from appropriate probability distributions. Rectangularly distributed random numbers present no problem because their generators are widely available. However, other forms may require special procedures.
Unless otherwise noted the game will use the procedure for calculating "normal (gaussian) deviates" described on pp. 289-290 of William H. Press, Saul A. Teukolsky, William T. Vettering, and Brian P. Flannery, Numerical Recipes in C, Second Edition (Cambridge University Press, 1992).
The departmental database will be constructed from the initialization dataset and the four sets of additional inputs described below. Only the data for the departments to be included at the start of the game will be loaded at initialization. The departmental database will be augmented or consolidated with the addition or merger of departments as described below.
The Dept_Master spreadsheet lists the 28 departments that currently are candidates for inclusion in the game. We could probably accommodate another 20 or so, though each addition adds to the burden of data collection. Two "field identifiers" are associated with each department: "IPEDS field" is the field in the IPEDS database in which the department's degrees are included; "GAME field" represents the field definitions to be used in the game. The game fields’ detailed definitions can be adjusted until the game's database is frozen, but it will get progressively harder to change their number or broad characteristics (which affect the type of building, icon, and/or sound effect used to represent the field).
While the game will include data on 28 or more departments, a much smaller number (perhaps 6 to 12) will be active in the game environment at any one time. Player's initial inputs determine which departments will be present at the start of the game. (The procedure for doing this has yet to be determined.) The player will be able to add new departments and/or merge old ones as described below. The ability to create and destroy departments represents one of the game's most important strategic elements.
• A new department can be created by selecting from the unused departments in the database. The Player will have to hire faculty (the new department will start with an empty faculty roster), and we expect that a building will have to be reallocated or constructed to house them. The player will be able to view student and research demand data for potential new departments before making their choices.
• Departments can be merged by Player command. All existing faculty will become members of the new department, effective at the beginning of the first semester following the announcement. The new department will carry the name and the characteristics of the merged department contributing the largest number of tenure-line faculty. Faculty and student morale will be adversely affected by the merger, with the degree of adverse effect being largest when the number of contributed tenure-line faculty is nearly equal for the two merger partners. (A response function will be defined.) Players might wish to "starve" the weaker merger partner by encouraging resignations and withholding new appointments for a few years before the merger is announced. This will tend to contain the adverse morale effects within the weaker partner.
Each department will have two kinds of data associated with it. The first is a set of parameters, described below, that determine key variables that apply to the department as a whole. The second designates one or more "templates" for elaborating certain variables at a high level of detail—for example, faculty data in terms of age/rank and gender-ethnic categories. Using templates, which can apply to more than one department, avoids the need to define an extensive set of parameters for each of the game's 28 or more departments. We will define as many templates of each type as necessary, and individual departments can be given unique templates where this is worth the cost. Designating two or more templates for a given variable will calculate an equally-weighted linear combination of the template values. Mixing a small number templates thus can produce a wide variety of different departmental patterns.
The following department-level parameters are defined on the Dept_Master sheet.
Faculty salary multiplier: adjusts the institution-wide average salaries to account for field differences (e.g., Business salaries are higher that English salaries). Data on faculty salaries by field are available in the Chronicle of Higher Education.
Normal teaching load multiplier: adjusts the institution's "overall normal teaching load" for field. (A procedure described in Section 4 calculates the normal teaching load from the institution-wide sponsored research and doctoral student ratings.) For example, English faculty might average twenty percent more courses than the institution-wide average. The teaching loads for the individual faculty sims must be integer but the averages used to drive the individual loads need not be integer. Faculty surveys (e.g., conducted by Sandy Aston at UCLA) may provide data on the ratios of average teaching loads among field.
Normal research per faculty multiplier: adjusts the figures for institution-wide average sponsored research per faculty member to account for field differences (e.g., Chemistry has more sponsored research than History). NSF or NRC may have relevant data: however, the normalization has to be right (e.g., research dollars from all sources, or perhaps all Federal sources, per faculty member, not per principal investigator).
Female multiplier: adjusts the institution-wide female percentages to account for field differences (e.g., Education may have a higher proportion of women than Engineering). Data may be available in the Chronicle of Higher Education.
Minority multiplier: adjusts the institution-wide minority percentages to account for field differences (e.g., Gender-Ethnic Studies may have a higher proportion of minorities than Physics). Data may be available in the Chronicle of Higher Education.
Average teaching talent: the department-wide average for teaching talent of the initial faculty. No data are likely to exist, so the talent indices will have to be set by judgment.
Average scholarship talent: the department-wide average for scholarship talent of the initial faculty.
Average research talent: the department-wide average for research talent of the initial faculty.
Percent of faculty who may be principal investigators: a measure of the sponsored research market potential for the field, this is the maximum percentage if all faculty have research talent equal to 100.
Normal research project size: a measure of the granularity of research sponsorship in the field, this represents the mean of the distribution of research project sizes.
Multipliers for doctoral graduation percent and mean times to degree and dropout: adjusts the institution-wide figures to the departmental level.
Tuition rate multiplier: allows for differentiation the institution-wide tuition rate obtained from the initialization dataset. Tuition rates for undergraduates and non-matriculated students are assumed to equal the institution-wide rate. However, masters and doctoral students in certain fields may pay a different amount depending on the tuition rate multiplier.
All of the above will be displaced by a small random deviation when loaded into the database of relevant departments. The deviations will be based on a single-humped (non-rectangular) probability distribution. The departmental figures will be used as means. The variance factors are shown in row 32.
The Dept_Master sheet also contains a column labeled "total courses taught per semester." These data are provided for testing purposes. Section 3.7 describes how the figures will be calculated as part of the student database initialization.
The Curric_Requirements spreadsheet contains the information needed to execute the course choice algorithm as described in Sections 5.2-5.4 of Td_2.2. Data are provided for each department (including the "General Education" pseudo-department), course depth group, and student level.
The structure will be illustrated using the first section of the spreadsheet, "Requirements for the Bachelors Degree," which applies to student levels 1 and 2. Departments and course depth categories are listed in the first column. This is followed by two "control totals" (columns B and C) which aid data generation but do not need not be included in the game's database. They are followed by the data for required and elective courses.
The first column under "required courses, "labeled "own department" (column E) refers to the courses taken in the same department as the major (shown on the row). For example, English majors must take 2 English courses at depth level one, 2 more courses at depth level two, and 1 additional course at depth level three. "Own-department" does not apply to general education.
The above illustrates a slight deviation from the specification in Section 5.2 of Td_2.2. It seems easier to produce data for the "induced course requirements matrix" directly rather than computing it from an underlying matrix. Hence the data in Curric_Requirements refers to the induced course requirements matrix" of Td_2.2, which will now simply be called the "course requirements matrix."
Continuing the with requirements for English majors, we see that one course at each of the three levels is required under the "Humanities" heading. This refers to "humanities courses other that English." Adding the required language studies courses produces the control totals in column B—10 courses required in all, out of a total of 32 courses (4 per semester times 8 semesters) needed for the bachelors degree.
Like all other undergraduates, English majors must complete the General Education Requirement. This amounts to 8 courses in total, 7 at depth level one and 1 at depth level two (rows 5-8). These requirements are additive to those of the major.
Columns O-R define the minimum number of electives to be taken in each year of a student's program. For convenience we include the first-year figure (1 course) under General Education because it applies to all students. The later-year figures will vary by major—for English, the minimum is 2 courses each year. According to the choice algorithm, students are free to take more than the minimum number of electives. The minimum spaces the satisfaction of requirements over time. Otherwise the algorithm would seek to satisfy all requirements before branching out to electives, which does not reflect real-world behavior.
The requirements and electives for masters and professional degrees (student level 3) begin at row 122. The structure is the same as for undergraduates except there is no General Education requirement and, because the program is only one year long, no need for a minimum number of electives per year. Graduate level courses (depth level four) have been added, and depth levels one and two have been dropped. The doctoral requirements (student level 4) also drop the depth-three courses because doctoral students take only graduate courses.
No data are provided for the non-matriculated students (student level 5). These students are not seeking a degree, so they are assumed to have no requirements. Their course choices are governed entirely by the student preference data described in Section 4.
We know of no consolidated data source on curricular requirements. However, information on requirements could be obtained by sampling course catalogs. We are not trying to differentiate requirements by type of institution, although this might be a useful refinement for a future game version. Hence it should not be too difficult to get approximate data for representative institutions.
The faculty templates elaborate the department-level parameters (described in Section 3.1) to the faculty age/rank categories. Three sample templates are illustrated on the "Faculty_Templates" sheet.
The rows of each template correspond to the faculty age/rank categories used to characterize the faculty sims. They are the same as discussed in earlier technical documents except that the Full Professor category has been expanded to three age-related sub-groups: ages 41-50; 51-60; and over 60. Age ranges for the other ranks are provided as well. Once a new faculty sim ("new" at initialization or "new" through hiring) has been placed in a rank/age category the actual age will be determined by drawing a random number within the indicated age range.
Column B gives the "base rank/age" distribution for the department to be represented by the template. This will be combined with the institution-wide rank percentages in the Faculty_Initialization" sheet. Columns C and D extend the gender and minority multipliers to the rank/age categories. For example, female faculty may have ten percent greater incidence at the assistant professor level than for the department overall. Columns E, F, and G extend the salary multiplier. The first column gives the average salary for white males in the rank/age category. The next two columns adjust these figures for gender and ethnicity. Finally, columns H, I, and J extend the talent indices. Data on minority and female percentages by rank and faculty salaries by rank and gender/ethnic category may be available in the Chronicle of Higher Education.
The next column gives the multiplier for the normal teaching load of this faculty age/rank category. In the sciences, for example, the department might assign assistant professors an average of only one-third the course load of tenured faculty in order to give them time to launch their research program. Data on teaching loads by rank, perhaps cross-classified by field, may turn out to be available in published faculty surveys.
The last six columns present data used in the determination of faculty discretionary time. The columns call out the discretionary time categories described in Section 7.1 of Td_2.2. Each row of data presents the "base discretionary time preferences" for an age/rank category. "Base" refers to a professor with teaching, research, and talent indices all equal to 50 (midway between 0 and 100). The figures are expressed as percentages of the available work-week, after deducting teaching contact hours, devoted to the activity. (The available work-week will vary during the game as a function of the faculty member's morale.) Actual discretionary time preferences will vary as a function of the faculty member's talent indices, the Player's policy preferences, and perhaps other factors. A preference function will be defined.
While the aforementioned variables are listed across the page under each template heading, the templates may be mixed freely. For example, a department may be specified as using Template A for course load, Template C for age/rank, and a linear combination of B and C for the three talent multipliers.
This sheet contains a single section: variables dealing with teaching methods. The course types defined in Section 5.1 of Td_2.2 are shown on the rows. The first column gives the department's "target preference" for the teaching method, used in Step 7 of Section 5.3 of Td_2.2. The second column gives the "normal class sizes" used in Steps 3 and 4 of Section 5.4 of Td_2.2.
The next initialization step is to populate the student database with the appropriate numbers and types of sims. This includes determining, for each student level, the numbers of students to be simulated, the base transition probabilities for graduation and dropout, and the distribution of student numbers by year in program. The characteristics defined in the student database specification (Section 3.1 of Td_2.2) must also be determined for each sim.
Rows 5-8 of the Student_Pars spreadsheet determine student numbers by year in program. Step 1 extracts enrollment figures from the initialization database (row 7). However, the total number of sims must not exceed the that which can be handled by the CyberCampus code (H5 provides a tentative figure). Row 8 scales the raw enrollments so their sum doesn’t exceed the maximum.
Students enter into year 1 of their programs during play of the game, and then drop out, graduate, or continue to the next year depending on the relevant base transition probabilities and course-taking achievements. The base transition probabilities may be adjusted for individual student sims depending on the student's satisfaction index (response function to come).
The determination of the base transition probabilities for undergraduate students appears in rows 13-20 of the Student_Pars spreadsheet. The first step, cells A15 and A19, extracts the graduation percentages for traditional and non-traditional students from the initialization dataset. Column B scales these figures up to account for the fact that the dataset provides the percentage graduating in five years (assumed to be eight years for non-traditional students). The dropout percentage is simply 100 minus the graduation percentage. Cells R14:Y15 contain our own assumptions about the distributions of time to degree. For traditional students who graduate, 75% are assumed to do so in four years, another 15% after their fifth year, and so on. Cells I14:P15 present the same assumptions for dropouts: 26% of dropouts do so after their first year, another 23% after their second year, etc.. The cells in each set add up to one.
Cells I19:P20 calculate the base transition probabilities for dropout. These figures, which will vary with institutional characteristics but not be department, are used to determine whether individual student sims drop out or continue on the next program year. Graduation is not determined according to a transition probability, but rather by when a student passes the courses required for graduation. However, the graduation probabilities are needed to calculate the dropout transition probabilities for subsequent years.
The dropout base transition probabilities (dropProbt)are obtained by solving the following recursion equation for year (t):
(1)
where frDropOutst are the figures contained in cells I4:P15, gradRate is in A15, and frSurvivorst represents the fraction of the original entering class who remain the program in year t. The expression (1–gradRate) equals the overall dropout rate, as noted above. The usage of dropProbt as a transition probability is clearly seen in the formula.
Equation (1) becomes more complex as t increases because the fraction of survivors in each year depends on all the prior graduation and dropout probabilities. Because 100 percent of the entering class has "survived" the beginning of the first program year, frSurvivors1=1. In general, however, frSurvivorst=(1–gradProbt–1–dropProbt–1) frSurvivorst–1. The new variable gradProbt is the transition probability for graduation: that is, the probability that a student who remains in the program at year t will graduate at the end of that year. It should not be confused with gradRate, which is the overall graduation rate. Because graduation will be determined by meeting course requirements, gradProbt represents an approximation to be used only in the calculation of dropProbt. Equation (2) illustrates the calculation for dropProb3.
(2)
The spreadsheet formulas provide the complete set of calculations.
While masters students (SL-3) generally complete their degrees in one year, the exact timing is once again determined by the completion of course requirements. Rows 53-60 of the spreadsheet provide equivalent calculations to those for undergraduates. (No adjustment to the overall graduation probability is needed.) As for undergraduates, the longer it take to complete the requirements the greater the chance of dropping out before completion. No transition probabilities are needed for non-traditional students (SL-5).
The doctoral dropout and graduation probabilities differ from those for undergraduate and masters students in several respects. Graduation is time-dependent because students must complete a thesis after satisfying course requirements. Hence the graduation of individual sims will be determined by drawing random numbers against the relevant base transition probabilities. Research by Massy and Goldman has confirmed that graduation rates vary significantly by field as well as institution. The same research provides insights on mean time to graduation for various science and engineering fields.
Cells A99 and A102 extract the overall PhD. graduation rate and mean time to degree from the initialization dataset. (The "Master:Main" spreadsheet now calculates graduation rates and mean times to degree for each institution as s-functions of the school's doctoral and research ratings.) Cell B99 contains our assumption about the ratio of mean times for dropout and graduation, and B102 gives the resulting mean to dropout. We assume that the frequency distributions of time to graduation and time to dropout are exponential, so the entire distribution is specified once the mean is known. (This is an easy way to get the mean times into the calculation.) The time to dropout begins upon matriculation whereas the time to graduation begins when a student has completed his or her coursework. Coursework is assumed to be completed in two years for purposes of the initialization calculations.
Cells F106:H133 translate the institution-wide graduation percentage and mean times to graduation and dropout to departmental-level figures using the multipliers specified on the Dept_Master sheet. The graduation percentage is limited to an arbitrary 80%, though it is likely that the maximum will never be reached.
Cells I106:P133 calculate the base transition probabilities for each department, and R106:Y133 do the same thing for the dropout probabilities. The calculations follow the same logic as the other student levels except that frGradst and frDropOutst are determined from the aforementioned exponential distribution. The formula for the fraction of graduations at time t is:
(3)
The intermediate variable uncondFrGradst reflects the probability of graduating at time t given that the student will graduate sometimewhere ("2" has been subtracted from the numerators and denominators to account for the first two years of course-taking behavior). The final result, frGradst adjusts for the overall probability of graduation and converts the unconditional probability to a probability conditioned on being in the program at the beginning of year t. The formula for frDropOutst is the same except that "2" is not subtracted.
Determination student numbers by program year, needed for initialization, proceeds in two steps. Step 1 calculates the sum of all the "survivors" fractions, which were described just before equation (2), to a sufficiently large value of t for the remaining terms to be negligible. The result is labeled "Population divisor" (cell Z19 for traditional undergraduates). The first two terms are:
(4)
The fraction of students initially in the each program year equals the corresponding term in (4) divided by (4). For example, the fraction in year 1 of their programs is 1/popDivisor and the fraction in year two equals the second term of (4) divided by popDivisor.
The resulting fractions represent an approximate steady state for the student transitions model.
The fraction of students in each gender-ethnic category is determined from the "percent female" and "percent minority" data in the game's initialization database. The data are provided separately for each student level. The distributions are assumed to be independent, so the fraction of minority females equals the fraction of minorities times the fraction of females.
The student sims generated according at initialization will be assigned to gender-ethnic categories by drawing random numbers against these probabilities.
The talent indices for academics, extracurricular activities, and athletics for each student sim will be determined by random draws from appropriate non-rectangular probability distributions. The parameters will depend on the institution's characteristics and applications, acceptances, and matriculations for student segments as represented in the initialization data [algorithm to come].
The dollar amounts of need-based aid and merit-based aid for each sim will be determined by random draws from appropriate non-rectangular probability distributions. The parameters will depend on financial aid policies and the talent indices [algorithm to come].
Sheet "Student_Pars" provides the data for assigning students to fields, and where applicable for determining their preference for elective courses. There is a separate data block for each student level. Column A calls out the department name. Column B is the IPEDS broad field definition, which is needed to link the departmental data to the game's initialization and benchmark-institution datasets. Columns G-E determine the student's major field preference, and columns G-AH provide preference data for elective courses.
The first eight departments (for undergraduates, rows 24-31) have a unique mapping to the IPEDS database. Student preferences for these departments equal the fraction of degrees reported in the game's initialization dataset (columns C and E). Things become more complicated when the mapping isn’t one-to-one. To handle these cases we first extract the IPEDS percentage (column C) and then multiply by a "fraction of field pct" (column D) to get the "major field preference percentages" reported in column E. The fractions may have to be set by judgment, but we would like to identify relevant data if at all possible.
The major field preference percentages must be adjusted to reflect the array of departments being included in the given game session. The "adjusted major field preference percentages" will be obtained by dividing the percentages in column E by the sum of the percentages for the included departments. (The calculation is not illustrated in the spreadsheet.)
Undergraduates do not enter the university with a major already specified, but rather select one between their first and second years of study. Because they fall into an "undecided category," only the General Education requirement applies during their first year. All new traditional and untraditional undergraduates will be "undecided." The sims for second- and subsequent-year undergraduates and all students at other levels will be assigned to majors by drawing random numbers against the adjusted major field percentages.
The Student_Pars sheet provides the preferences for elective courses in columns G-AH. (Column F is a control total, not needed in the game itself.) The column headings represent CyberCampus broad field definitions noted in connection with the Dept_Master sheet. The percentages are used to determine the probabilities for selecting electives in the course choice algorithm.
Complications arise when a the major department (a row in the table) calls for an elective department that is not currently in the game. This is resolved as follows: (a) if one or more departments in the same "Game Field" (col C of Dept_Master), is currently in the game, distribute the excluded department's probability over the include one(s) in proportion to the latter's probability value(s); (b) if not, distribute across all included departments (regardless of field) in proportion to their probabilities. The major department itself is always excluded from the distributions.
Data on the choice of course and majors may be available in the research files of the Institute for Research on Higher Education (IRHE) and the University of Pennsylvania.
The courses taken by each student sim will be obtained by running the course choice algorithm, unconstrained by faculty numbers, from the beginning of the student's program through the year prior to the sim's year-in-program. The course choice algorithm is run without faculty constraints because faculty numbers will be determined later, as a function of departmental teaching requirements.
The beginning value for the student's list of courses taken includes work through the prior year, but not the year to be simulated in the first round of game play. All courses-taken variables for year-1 students equal zero, since these students have just entered the university. Courses-taken for non-matriculated students also equal zero, since there is no need to accumulate courses against requirements.
The course choice initialization determines the number of courses required to be taught by each department as well as initializing the faculty sims. For undergraduates, we approximate the total teaching requirement by: (a) summing the courses taken for all student sims during their most recent program year; and than (b) running course-choice for the year-three students based on their previously-calculated initial conditions and adding the result to (a). The same procedure will be applied for doctoral students, except that the "extra" year will be year 2 rather than year 4. (Doctoral students take courses for only two years.) The effect will be to accumulate the course-choice totals for students in each year of their programs. For masters and non-matriculated students, we need only accumulate the courses actually "taken" during the initialization run.
No initial values for the satisfaction indices will be computed. Satisfaction will be determined for each student sim during the first semester of game play by ignoring the latency effect.
The initialization parameter-generating procedures are illustrated using data for the Biology Department and Template A. It calculates initialization values for the following departmental variables by age/rank category:
Part 1) teaching, scholarship, and research talent;
Part 2) principal investigators and average research project size;
Part 3) faculty numbers;
Part 4) percentage breakdown by gender-ethnic group;
Part 5) average salary by gender-ethnic group;
Part 6) base transition probabilities for faculty promotion and retention.
Section 5.7 describes how the "Initialization_Spreadsheets" procedures will be used in the game's initialization algorithms.
Average teaching, scholarship, and research talent indices are obtained from the departmental and age/rank data using the following s-shaped curve:
(5)
The formula takes account of the department talent index and the age/rank talent multiplier while insuring that the result will lie between 0 and 100. It is applied to each talent index and each age/rank cell, except for adjuncts who do no research or scholarship (cells B5:D11).
The row labeled "Talent Variance Factor" provides the information needed to specify the probability distribution of for determining the talent indices for the individual sims. A variance measure is required because this probability distribution is non-rectangular.
Column B calculates adjusted research dollars per tenure-line faculty member by multiplying the research per faculty from the initialization dataset times the departmental and age/rank multipliers.
Column C calculates the fraction of faculty who are principal investigators by multiplying the departmental percentage of faculty who are PIs times the research talent index. Recall that the departmental faculty PI percentage reflects the market potential that could be achieved if all faculty had research talent ratings of 100. Multiplying by the actual rating conforms the PI fraction to the talent in each age/rank category of the player-generated institution.
The last column determines the expected number of research projects per principal investigator. Dividing research dollars per faculty member by the PI fraction produces research dollars per PI, which is then divided by the departmental normal project size to determine the expected number of projects per PI. The result is truncated at 4 because CyberCampus allows for only four research projects to be active at any one time.
The determination of departmental faculty numbers begins by extracting the percentage in the age/rank category from the initialization dataset (Column A of Part 3-intermediate). These figures are multiplied by the departmental multipliers in Column B of Faculty_Templates and the results summed. Column D normalizes the results to obtain the final adjusted age/rank percentage figures.
The total number of departmental faculty, to which the age/rank percentages will be applied, depends in the department's teaching and research requirements and the "normal per-semester teaching load" for the institution. The institution-wide normal load is calculated in cell D25. It varies with the "doctoral students per faculty" and "sponsored research per faculty" ratings, which are variables in the game's initialization and benchmark-institution dataset. The formula is:
(6)
The resulting s-curve reduces the normal teaching load from an upper limit of 3.5 courses/semester, for institutions with the lowest doctoral and research ratings, to a lower limit of 1.75 for the most highly-rated institutions. The calculation is done once for the institution as a whole at the time of initialization.
Column B of Part 3 adjusts the normal per-semester teaching load for the institution by the departmental and age/rank multipliers. Column C adjusts the result for sponsored research commitments according to the following weighted average:
(7)
The second term, another s-shaped curve, reduces teaching load when the product of the PI fraction and the average project size becomes large. High research volumes reduce teaching loads through academic-year faculty-salary offsets or negotiations associated with hiring or retention efforts. Column D calculates the weighted average teaching load (cell D35) by summing the products of adjusted teaching load and the template figure for the fraction of faculty in the age/rank category.
The calculation for total faculty numbers (cell D47) divides "total courses required to be taught per semester" by the aforementioned normal teaching load per semester. "Total courses required to be taught per semester was calculated in the student database initializations as described in Section 3.7. Test data were included on the Dept_Master sheet. Cells E39:E45 multiply total faculty by the age/rank percentages and round the results to the nearest integer to get faculty numbers by category.
Part 4 of the spreadsheet begins by multiplying the institution-wide gender-ethnic percentages by the departmental and then the age/rank multipliers. The intermediate results, shown in B51:E57, are then normalized so that they sum to one (B60:E66).
Part 5 calculates average salary by age/rank category and gender-ethnic group by applying the appropriate multipliers to the institution-wide average salary for each rank. No normalization is required. As for the talent indices, the row labeled "Salary Variance Factor" provides the variance information needed to specify the probability distribution of for salaries.
Each year, faculty may leave the institution, be promoted, continue in the current rank/age category, or continue in rank but advance in age. For any given sim, exit or promotion is determined by drawing random numbers against the transition probabilities for departure (first) and then promotion. (Departures may result from job switching, retirement, or death.) If both random draws "fail," the sim remains at the current rank. Age is advanced by one year, which may trigger movement into a higher age bracket in the case of full professors.
Part 6 begins by determining the "raw transition probabilities" for promotion and departure. The raw probabilities are based on work by Massy and Goldman, part of a Sloan-sponsored study of the supply and demand for doctorates in the United States. They used multivariate logit analysis to estimate transition probabilities as functions of rank and institutional type for about a dozen science and engineering fields. The results are reproduced on the "Faculty_TransProbs" spreadsheet, along with guestimates of the data for the other CyberCampus fields. To extract the appropriate figures from Faculty_TransProbs, the logic in cells B85:C89 must be expanded to reproduce the institutional types used by Massy-Goldman as functions of the CyberCampus research and doctoral ratings. The desired mapping follows:
WHICH[
res ≥ 6 AND doc ≥ 6, public or private research university (columns C or D) depending on CyberCampus's control type;
doc > 2, public or private doctoral university (columns E or F) depending on CyberCampus's control type;
control = "Private," private liberal arts college (column G);
control = "Public," public comprehensive university (column F)]
CyberCampus's base transition probabilities depend on the particular faculty member's history as well as his or her field and rank. For assistant professors, the promotion probability is nil during the first three years after in-hire, rises to a peak in the sixth year before plummeting to a low value in year 7. The "percent promotions" figures in cells E84:E88 define this distribution. Multiplying by the overall promotion probability and adjusting for survivorship produces the year-by year promotion probabilities in F84:F88. The departure probabilities are assumed equal until the seventh year, when it is set at one minus the promotion probability to "clear" the assistant professor cohort as required under the AAUP's tenuring rules.
Time-in-rank for associate and full professors is not as tightly constrained as it is for assistant professors, and the potential duration is much greater. Hence we ignore the survivorship adjustment and simply multiply the raw probabilities by factors that reflect the desired time effects. Recency of promotion reduces the base departure probability for both ranks—in other words, one is less likely to move soon after promotion. Associate professors’ promotion probabilities decline significantly after eight or ten years, since a person who hasn’t been promoted by then generally suffers from some impediment that makes promotion less likely in the future. The time effects are modeled in terms of s-curves, with parameters as provided in cells D104, E104, and E107. The faculty database must include entries for age and year of last promotion (in-hire date for assistant professors). Test data are given in C104:C107. Adjunct faculty members’ departure probabilities are not subject to time effects.
Unlike the student initializations, the faculty initializations and transition probabilities do not approximate a steady-state system. There are technical reason why achieving such an approximation would be difficult. Perhaps more to the point, the real-world faculty rosters are rarely if ever in steady state due to constantly shifting conditions and long time constants.
The transition probabilities applied to individual sims in particular years are obtained by multiplying the base values discussed here by response functions the depend on satisfaction, in the case of departures, and performance and perhaps the incidence of an outside offer in the case of promotions. (These functions are yet to be specified.)
The faculty sims will be created by following this series of steps:
Step 1. Generate the number of faculty sims for each department and age/rank category as indicated in column E of part 3.
Step 2. Select an age for each sim using a rectangular random variable with range as shown in the age/rank definition.
Step 3. Randomly assign each sim to a gender-ethnic category using the probabilities given in part 4.
Step 4. Assign a salary to each sim using the appropriate mean and variance factor from part 5. (A single-humped probability distribution will be specified.)
Step 5. Assign teaching, scholarship, and research talent indices to each sim using the appropriate mean and variance factor from part 1. (A single-humped probability distribution will be specified.)
Step 6. Assign active research projects using the following sub-steps.
a) Determine whether each tenure-line faculty sim is a principal investigator by drawing a random number against the appropriate probability from column C of part 2.
b) For PIs, determine the number of projects by drawing a random number using the appropriate mean and the variance factor in column D of Part 2. The number of projects will be truncated at four, which is the number of project slots allowed for in the CyberCampus code.
c) For each project, determine the project size by drawing a random number using the mean project size and size variance factor given in column O of Dept_Master.
d) For each project, determine the remaining duration, in months, by drawing a random number between 1 and 12. (All projects are assumed to last one year, but the end date at initialization is random.)
Step 7. Assign research proposals using the following sub-steps.
a) Determine whether each tenure-line faculty sim has proposals outstanding: (i) if the sim has an active project, let the number of proposals equal the number of active projects provided that the sum of proposals and projects not exceed four.
b) If the person is not a PI, use logic like that in steps 6(a) and 6(b) to determine the number of proposals.
c) For each proposal, determine the project size by drawing a random number using the mean project size and size variance factor given in column O of Dept_Master.
d) For each project, determine the time remaining before decision, in months, by drawing a random number between 1 and 4. (All projects are assumed to be decided in four months, but the date at initialization is random.)
Step 8. Determine the fraction of active research project support used to offset the faculty member's academic-year salary. (This element should be added to the faculty database.) [The fraction will be based on the total volume of active research projects: algorithm to come.]
The other variables in the faculty database will be calculated for the first time during the first semester to be simulated. Latency will be eliminated during the first-time calculation.
Running the initialization procedures to this point has produced the following financially-relevant data at the level of the student and faculty sims:
• Numbers of students paying tuition. (Tuition rates for graduate students may vary across departments.) Can be aggregated to produce institution-wide gross tuition.
• Financial aid dollars awarded to each student. Can be aggregated to produce institution-wide financial aid. Subtracting the result from total gross tuition produces total net tuition.
• The academic-year salary of each faculty member. Can be aggregated to produce total faculty salaries. Adding an assumed fringe benefit rate (say, 30%), produces total faculty compensation. This is the column total for "AY faculty compensation" in the expenditure section, Table 2 of Td_2.3.
• Academic-year faculty salary offsets to research projects. Aggregation produces the entry for AY faculty compensation in the "Sponsored Research" line of the aforementioned expenditure table. Subtracting AY salary offsets from total AY faculty compensation produces the same column entry for the "Academic Departments" line. Subtracting AY salary offsets from total sponsored research expenditures produces the sum of staff salaries for research and other research expense. (The sum can be separated into its components using information in the initialization dataset.)
The problem now is to conform the revenue and expense data from the initialization database with the tuition, financial aid, and faculty salary data embedded in the student and faculty sims.
The calculation proceeds in two steps:
Step 1. Scale the remaining revenue items (i.e., items other than tuition, financial aid, and sponsored research) and all the balance sheet items according to the ratio of net tuition as calculated from the student database to net tuition as obtained from the initialization dataset.
Suppose, for example, that the calculated net tuition is $55,000 (i.e., $55 million), as compared to $159,705 in column U of the initialization database. The resulting ratio, 55000/159705=0.344, will be used to scale the revenue items in columns X through AB of the initialization dataset. The balance sheet items, in columns L through R, will be scaled by the same ratio. In other words, the balance sheet and revenue items other than sponsored research will be 34 percent those in the initialization dataset. The difference arises because the number of simulated students will be less than the actual number of students for large institutions.
Step 2. Scale the remaining expenditure items (i.e., items other than faculty salaries and other sponsored research expenditures) by the value of k in the following equation:
The solution is:
We may wish to adjust the surplus-deficit at random, in response to the player's initialization specifications, or to reflect a given scenario.
The value of k will always be positive. Once calculated, it conforms the revenue-expenditure balance to the desired surplus or deficit while maintaining the links between the student and faculty database and the financial statements.
The initialization procedures extract a group of so-called benchmark institutions at the same time as the schools used to develop the game's initial conditions. The procedures described in this paper can be used to calculate data for the benchmark schools, just as for the player-generated institution. The list of benchmarking variables will be established in light of the game's reporting needs and the possibilities offered by the procedures described herein.