Methodology

The core of the study is a participant observation of IBM's IBMPC computer conferencing facility. This participant observation started in September, 1983 and continues today. The observation data has been supplemented, however, with archival data, including nearly complete transcripts of the two years of IBMPC's existence that predate the beginning of the participant observation. Several surveys of IBMPC's users, each spaced out at roughly a two year interval, were also conducted. Finally, we have drawn on typographic descriptions of a variety of media in an attempt to create a context for computer conferencing in the world of media.

This study freely intermingles these four broad sources of data, each of which has an associated methodology.

Methodology for the Participant Observation

The essence of participant observation is summarized with peculiar clarity by Turkel (1984) in the methodology notes of The Second Self: Computers and the Human Spirit:

The style of this work is ethnographic. Like the anthropologist who lives in an isolated village in a far-off place to get to know its inhabitants, their ways of seeing and doing things, their myths and rituals, their economy and artifacts, I lived with worlds that were new to me, tried to understand what they are about, and tried to write about my understandings so that the worlds I studied could come alive for others.

This kind of enterprise stands in an ambiguous relationship to science and art. The research is systematic: one informant's account of how something is done is checked and rechecked against the accounts of others; careful note is taken of what people do so that this can be compared with what they say they do. But at the same time, the very process of research is interpretive.

The art of participant observation entails a delicate balance. To observe in the real world one must participate in that which is observed. The closer the participation, the more detailed the observation may be. Indeed, the most successful participant observer will frequently be the observer who blends entirely into the community observed (accepted as a member of the community rather than an observer) and substitutes the community's ways of thinking for his or her own. There is a risk in participating too closely. One may risk losing ones objectivity concerning events. The line between observing for maximal detail and participating so closely as to lose objectivity may be difficult to recognize.

This study required a closer level of participation than may be the case in other participant observations. When Henry (1973) observed families in Pathways to Madness, he was a short term guest for each family. No observations lasted longer than a week. He was never anything but an observer. He only lived in one of the homes he observed. He never initiated action, preferring to followed the lead of the observed. When Turkel (1984) observed computer users for The Second Self, she was usually a short term observer and interviewer of individuals. Few interactions spanned more than a few months, and she was always a guest, an outside observer looking in. When ethnographers visit cultures for extended observations, they are almost always guests, usually with no formal responsibilities other than their observation.

The same cannot be said of this observation. I have been an employee of IBM since September, 1983. By virtue of working in the group that has developed and maintained IBMPC since December, 1981, I have been involved with IBMPC in a variety of capacities over the course of the study. Some of these capacities have included some measure of administrative responsibility for the conferencing facility. My management has supported this research because of its belief in conferencing, but the study has never been the focus of my job responsibilities, which have generally entailed consulting and software development.

Possible Benefits

This arrangement has clearly been beneficial in some ways. The responsibilities have given me a view of IBM that I would not have had otherwise; a view that frequently adds value to the observations of computer conferencing. It has allowed me to experience, first hand, the impact of IBMPC and computer conferencing on IBM's software development process. This impact will be clear in the survey results and discussions of the effects of the IBMPC computer conferencing facility, but the value of computer conferencing to such development is clearer when one understands the formal process through which products are developed in IBM.

Partly in consequence of this product development work, it has also given me freedom to travel around the company. Partly in consequence of my association with the group that administers IBMPC, I have had great freedom to interact with IBMPC participants, including high level executives in IBM, that I would never have interacted with otherwise. My conversations with these people have not always involved discussion of computer conferencing. These interactions have, however, created a view of IBM's culture, business requirements, and the way IBM employees look at each other that I would never have had otherwise.

Some elements of this view can be seen in books like Watson's (1963) A Business and its Beliefs: The ideas that helped build IBM, Rodger's (1986) The IBM Way: Insights into the World's Most Successful Marketing Organization, and Sobel's (1981) IBM: Colossus in Transition. Other elements of this view simply cannot be understood except from the perspective of being an employee of IBM and interacting with others.

The risk

These opportunities are not, however, without risk. The study was conducted, for the most part, outside of working hours. It is possible, moreover, that in participating in the IBM culture so closely I have effectively been assimilated into that culture. If so, I may not be able to assess what I have observed with proper objectivity. It may be that I will view as perfectly reasonable aspects of IBM and the IBMPC computer conferencing facility that people outside IBM might view as unusual or even outrageous.

This risk may be magnified by the observer's close participation in the IBMPC computer conferencing facility, where he has sometimes been counted among the more prolific contributors to IBMPC. The risk may also be magnified by the observer's membership in the group that manages and administers IBMPC, and the observer's occasional role as one of IBMPC's administrators. This position has allowed observation not only of the daily life of the IBMPC computer conferencing facility, but of the decision processes of people who believe in the potential of computer conferencing, and who have worked hard to make it succeed in IBM. Still, there remains a risk that the author may have been too much of a participant, and not enough of an observer:

The risk, in each case, is that the participation compromised objectivity and potentially blinded the observer to things that would have been seen otherwise. I am not in a position to make this assessment, and leave such judgements to the reader.

Genre and Theory

The presentation of the participant observation will hinge, to some extent, on the treatment of types of computer conference as if they constituted distinctive literary genres. It has generally been assumed that where a distinctive "genre" of computer conference can be identified, it would be possible to identify both generic processes and associated functional applications of the genre, functional explanations of the generic process. This entails an assumed relationship between effect and practice, as expressed in genre, that dates back to the beginning of this study. The larger theoretical structure that integrates these constructs (incorporating generic processes as an element of practices) is comparatively recent. This device has proved a powerful one, and was once the major organizing principle of the entire study. Genre is, however, only a small part of the study as it exists today.

Methodology Associated with Archival Data

The transcript of computer conferencing events stored on the IBMPC conferencing facility and associated IBMPCARC archive facility is used repeatedly in this study to obtain the messages that have been the jury of last resort in the observation, within genres, of characteristic messages and message patterns. Wherever it was felt a pattern was understood, the transcript was used to identify exemplars of that pattern. This is the largest use of archival data in this study.

Archival data has also been used, however, to aid understanding of the growth patterns associated with IBMPC and for obtaining a sample pool of IBMPC participants. The requisite data, in both cases, was obtained from IBMPC history files that record each electronic mail transaction associated with the conferencing facility. This history file contains information concerning the type of action taken, the forum (or other file) the action was taken against, and the network identity (in the form of the userid and node) of the person who took the action.

When used for the purpose of generating a sample pool, the userid and node of a large number of IBMPC participants was collected in a file. This file was enhanced, in 1988, with the addition of information concerning the geographical location of the node and the level of IBMPC use associated with the userid and node. The information collected in this manner allowed for both the construction of a stratified random sample of IBMPC participants and the generation of some useful demographic information. Both sets of information will be explained in somewhat greater detail later in this chapter. Analysis of this data was restricted to frequency distributions and, where the 1986 and 1988 samples were compared, to chi-square comparisons of 1986-based expectations with the 1988 reality.

When used for the purpose of understanding the growth patterns associated with IBMPC, the history data was extracted into three series of weekly entries. One series documented the number of individual appends that were added to IBMPC each week. A second series documented the number of individuals who made appends to IBMPC each week. The third series documented the number of individual files on IBMPC which changed each week, either through the addition of appends, replacement of the entire file, or the creation of a new file.

Analysis of this data combined curve fitting, regressive spectral time series analysis (Jim Watt's FATS procedure; no reference available), and multiple regression analysis. The curve fitting was used to trace the growth of conferencing activity, and was performed by plotting the data in each set with Lotus FreeLance. On the basis of the curve fitting, two linear components, collectively accounting for 85% to 95% of the variance in each of the three measures, were identified. The results of this analysis will be reported in the chapter on the growth of computer conferencing in IBM.

The regressive spectral time series was performed to trace the impact of periodic variation in IBMPC use, and was applied to the residuals of each measure after removing the two linear components. Multiple regression was used to test the results of these two analyses together and to identify residuals for subsequent analysis. Upwards of twenty significant periodic components were extracted for each variable, with the pattern of extracted components suggesting the existence of one or more chaotic attractors in the data set. None stand out from the crowd. Hence even though we are fairly confident of one, three, and six month periodic components in the data, they will not be reported in this study.

Methodology for the Questionnaire Survey

Two formal surveys of IBMPC participants were conducted. The first was conducted during the early summer of 1986; the second during the summer of 1988. A third, informal survey of participants has been conducted continuously since early 1985. The content of this third survey is composed entirely of open ended responses to a single question. It will be reported on in the chapters on genres of computer conference and the benefits of computer conferencing.. The formal surveys are more conventional in form. Although administered electronically, they are composed of a variety of questions administered in several formats. Although the two formal surveys were very similar, they were not the identical, either in content or administration. Specifically:

The Samples

Too much can be made of this last difference. The sample pool, in both surveys, included contributors to the IBMPC computer conferencing facility, subscribers to the facility, people who performed get requests against the facility, and people who linked to the facility, which sample data drawn from both the IBMPC master and from various IBMPC shadows. Still, the surveys do differ in both the scale of the sample pool and the mechanics of the survey distribution.

Scale of the sample pool

In the 1986 survey a pool of 300 potential respondents was used as both the sample and sample pool. In the 1988 survey, a sample of 600 was drawn from a sample pool of 13,595 IBMPC users. This difference in scale allowed more attention to be given to demographics. The sample demographics grouped users on two dimensions, including geographic location and IBMPC Use. It was not possible, and no attempt was made, to balance these two dimensions. Hence while some demographic groups are over-represented relative to others, every geographical and usage grouping had at least forty members in the sample selected.

Only three geographical groups were measured in 1986: United States, Europe, and other. Obvious increases in IBMPC use in New Zealand, Australia, Japan and elsewhere led to the use of six distinct geographical groups in 1988, overviewed in the following table:

Geographical Location 1988 1986
Size of Sample Pool Percentage of Sample Pool Number sampled Percentage of Sample Percentage of Pool Selected Percentage of Returns
United States 10273 75% 326 54% 3% 62% 94%
Canada 447 3% 49 8% 11% 8%
Europe 2151 15% 91 15% 4% 16% 3%
Central and South America 169 1% 42 7% 25% 5% 3%
Asia and Pacific 468 3% 51 8% 11% 6%
Middle East and Africa 87 1% 41 7% 47%  
Overview of Geographic Demographics The geographical demographics of the 1986 and 1988 surveys

A chi-square comparison of the 1988 survey returns against the expected returns by geographical demographics shows no significant differences (X**2=8.98 with 5 degrees of freedom), indicating that the geographic groups respond to the survey in roughly the same percentage in which they are sampled. Not unexpectedly, a chi-square comparison of the 1988 survey return geographical demographics against the expected returns based on 1986 survey return percentages shows a very significant change (X**2=332.95 with 2 degrees of freedom, p<..00001). While some of this change is probably due to changes in the sampling technique, it seems likely that much of the difference is the result of increased participation in the IBMPC computer conferencing facility among non-U.S. IBM employees.

Participants were assigned to first of these classes to which they belonged, on the assumption that each level of action represented a greater commitment to IBMPC use than any subsequent level, as seen in the following table:

Usage Categories 1988 1986
Size of Sample Pool Percentage of Sample Pool Number sampled Percentage of Sample Percentage of Pool Selected Percentage of Returns
Extremist 84 1% 40 7% 48% 4%
Daily 156 1% 40 7% 26% 4%
Weekly 703 5% 44 7% 6% 19%
Monthly 1786 13% 58 10% 3% 22%
Occasional 5204 38% 157 26% 3% 22%
Non-Contrib           23%
Subscribers 1943   92   5%  
Informs 656   46   7%  
Getters 196   40   20%  
Linkers 2866   83   3%
Overview of IBMPC Usage Demographics The usage demographics of the 1986 and 1988 surveys

Mechanics of Survey Distribution

The mechanics of the survey distribution also differ somewhat in 1986 and 1988. In the 1986 survey, a letter was sent to members of the sample pool asking if they wished to participate in the survey and, if so, what form they wished to receive it in. Three forms were available, including paper (a printed copy of the questionnaire sent by mail), softcopy (an electronic version of the hardcopy questionnaire, delivered and returned via electronic mail), and executable (a program, runable on the IBMPC under PC-DOS, which presented the survey and saved the answers to disk). The executable was delivered, and the resulting output file returned, via electronic mail. Questionnaires were sent out as requests were received.

Pre-tests of the three forms of questionnaire indicated that the executable questionnaire was much easier to take than the softcopy or hardcopy. Users generally finished the executable questionnaire in 15 minutes, but required 20-30 minutes to complete the paper and softcopy versions. People also seemed to like the executable version better, although no formal measurement was made of this. Users had no way of knowing these results when the 1986 query letter was mailed out. Nonetheless, all but three of the 150 people who expressed interest in completing the survey requested the executable version, perhaps because of the novelty.

Similar enthusiasm for the executable form of the questionnaire was observed as the completed questionnaires were received. Indeed, a number of requests for the executable questionnaire vehicle were subsequently received from others who wished to conduct questionnaires. Answers returned from the executable questionnaire were also much easier to merge with the data set, as the returned data could be merged with the data set automatically, without direct keying. The first answer files from the executable started to appear within hours of the initial shipment of the executable version.

These results so strongly favored the use of the executable version that, in 1988, the mechanics of distribution were changed. Indeed, only an executable version was prepared initially, and a softcopy version was only created after requests (3 in total) were made for a non-executable version. No query letter was sent in 1988. Instead, the executable version was mailed electronically, along with a cover letter, to the entire sample.

No followup letters were sent in either survey.

Statistical Method for the survey data

A variety of statistical methods were used in the data analysis. In each use, the form of the data and nature of the question dictated the statistical method chosen. Methods used included:

Frequency Distributions
The most frequently used form of reporting of the survey data used here will be simple exploration of the frequency distributions of various measures. Where questions involved categorical data or were not comparable to other questions in either survey, the data analysis frequently stopped here. Frequency data were obtained using the SAS FREQ procedure.
Chi-Squares
Where categorical data needed to be compared against a set of expectations, chi-squares were performed. We would have happily accepted the .05 level as an indicator of significant difference. In fact, however, no chi-square in this analysis is significant at less than the .00001 level. All Chi-Squares were performed using a spreadsheet. The basic spreadsheet template was tested against examples with known results in advance.
Means and Standard Deviations
Where data could be reasonably assumed to be interval, means and standard deviations were computed using the SAS MEANS procedure. This data was used for two purposes. First, where a number of measures used the same metrics, the means were used to rank measures. Second, and more important, the data was frequently used to perform T-Test comparisons.
T-Tests
Where the means of interval measures were compared, T-Tests were performed. Three forms of T-Test were required in the analysis. It was frequently necessary, in analysis of both the 1986 and 1988 data, to perform paired comparisons of measures on the same questionnaire that shared the same metric. Hence T-Tests compared the hours respondents spent on various computer applications and media, various impacts of IBMPC on the respondent, and other measures.

With nearly half of the 1988 questions repeated from 1986, it was also necessary to compare results between 1986 and 1988. In this case population T-Tests were used to compare the results for the two surveys.

The availability of demographic information like geographical location, job category, and IBMPC use allowed comparisons to be made between groups. This kind of analysis was was restricted to the analysis of the 1988 survey questions concerning the value of specific features of IBMPC and the impact of IBMPC use on respondents.

The .05 level seemed a reasonable criterion of statistically significant difference. No T-Test accepted as significant in this analysis failed to meet the .03 level.

Spearman Rank Correlation
Where the rankings of means needed to be compared against a set of expected rankings, the Spearman Rank Correlation was used. Although the .05 level would have been acceptable, all tests that indicated significant differences met at least the .01 level of significance.
Factor Analysis
Several factor analyses were performed against the survey data. In each case the SAS FACTOR procedure was used to perform a principle components analysis, with factors extracted if their eigenvalues exceeded one. Once extracted, the factors were obliquely rotated using the SAS PROMAX option.

Methodology for the Numerical Taxonomy

The methods by which typologies can be constructed have been well established in fields like biology and medicine (Sneath and Sokal, 1973; Jardine and Sibson, 1971), where the construction of good typologies has proven central to the construction of theory. Sneath and Sokal (1973, p. 5) describe the general methodology of numerical taxonomy as follows:

organisms and characters are chosen and recorded; the resemblances between organisms are calculated; taxa are based upon these resemblances; and last, generalizations are made about the taxa.

This general method will be followed in constructing a typology of communications media:

The Cluster Analysis

The cluster analysis performed here will use a squared euclidean distance matrix based on standardized variables (to assure equal weighting). Because cluster analysis is an exploratory technique whose results can vary somewhat depending on the method of clustering selected, clusters will be formed from the distance matrix using two different methods of clustering:

It should be expected that these contrasting methods of clustering will yield results that are similar, but not identical. The differences may be as interesting as the similarities, however, as they may reveal forms of communication that straddle the lines between broader classes of media.

The Factor Analysis

The factor analysis performed here will be a principle components analysis. The resulting factors will be rotated obliquely to maximize the differences between factors and assure interpretability.

This same factor analysis method will be applied to the data set three times. The first analysis will examine the complete set of cases. The other two analyses will examine subsets of the cases. The first subset will be of traditional media. The second subset will include both traditional media and technologically mediated communications systems. The complete set, of course, also examines computer mediated systems. These three analyses will allow us to see if computer media change any of our more basic assumptions concerning communication.

To make the job of visualizing these relationships easier, the factors identified in the factor analysis will also be Varimax rotated to assure an orthogonal solution. Factor score coefficients will be obtained on the basis of the varimax rotation and factor scores computed for each case.

The Cluster Analysis versus the Factor Analysis

The decision to perform both factor analysis and cluster analysis is an unusual one. Analysts tend to use one or the other. Although the methods are different, they complement each other in important respects. When used together, it becomes possible to gain insights concerning the nature of factors and relationship of clusters that might not be observed otherwise.

The methods differ both in what they measure and the form of the data they are most typically applied to. Factor analysis is a measure of similarity. Factors are determined on the basis of correlations, a measure of similarity, and the highest value in the correlation matrix is a variable's correlation with itself (perfect or 1.0). Hence factor analysis throws away information about the uniqueness inherent to the items analyzed. Cluster analysis, by contrast, is fundamentally a measure of difference. Clusters are determined on the basis of the "distance" between cases, and the lowest value in the distance matrix is a case's distance from itself (none or 0.0). Note that this distance is fundamentally a ratio measure. Hence cluster analysis often throws away less information than factor analysis does.

This difference may be the most important. Factor analysis is, in consequence of its scaling, sensitive to both case selection and variable choices. Sufficiently similar variables will assure the emergence of related "dimensions". The addition of a few cases can result in substantial changes in the "underlying" characteristics found. Cluster analysis, by contrast, is fairly insensitive to either highly similar variables or the addition of cases. So long as characteristics are weighted equally, the technique is only marginally sensitive to the addition of a new characteristic, even when that characteristic strongly contrasts the others. New cases may, moreover, result in the formation of new clusters, but they will rarely result in major changes to existing groupings.

The more readily apparent difference between the techniques is the form of the data to which they are usually applied. Factor analysis is usually applied to data R-style, with characteristics (variables) clustered across cases (individuals). Cluster analysis, on the other hand, is usually applied Q-style, with cases (individuals) clustered across characteristics (variables). When used in this way, it is usually possible to relate the results of the two techniques. In the best case, groups revealed in the cluster analysis will inhabit distinct areas of the multi-dimensional space revealed by the factor analysis. When obtained, the resulting visualization serves both as a confirmation of the reasonableness of the two results and as a potential source of new insight.