This data was used for the majority of the tables and figures in the paper. Specifically, the data for Tables II, III, IV, V, VI, VII, VIII, IX, XII, XIII and Figure 3 comes from the artifact summaries. This data set includes the following information for each row, where the left column corresponds to the column in the data set provided below, next to a description of its value:
id: | The pipe's identifier, as defined by Yahoo! Note that some of the pipes may have changed since we ran our scraper, but the id can be used to view the pipe's structure and content on Yahoo!'s servers |
author: | An identifier for the author of the pipe |
prol: | A boolean value, 1 or 0, indicating if the author was flagged as a prolific author for the analysis |
authpipes: | The number of pipes within our sample created by the author |
createdate: | The date on which the pipe was created |
days: | The number of days since the author's earliest pipe in the sample that the current pipe was created |
config: | The number of user-setter modules in the pipe |
modules: | The number of modules in the pipe |
clones: | The number of times the pipe had been cloned at the time it was scraped from the repository |
l0: | Considering the community, this is the size of the cluster at level 0, where each pipe is in its own cluster (See Table I in the paper for cluster level descriptions) |
l1: | Considering the community, this is the size of the cluster containing this pipe at level 1 |
l2: | Considering the community, this is the size of the cluster containing this pipe at level 2 |
l3: | Considering the community, this is the size of the cluster containing this pipe at level 3 |
l4: | Considering the community, this is the size of the cluster containing this pipe at level 4 |
l5: | Considering the community, this is the size of the cluster containing this pipe at level 5 |
l6: | Considering the community, this is the size of the cluster containing this pipe at level 6 |
l7: | Considering the community, this is the size of the cluster containing this pipe at level 7 |
clustered: | The minimum level at which this pipe joined at least one other pipe in a cluster |
l0self: | Within-author clustering, this is the size of the cluster at level 0, where each pipe is in its own cluster. Note that all the within-author clusterings were only performed for the most prolific authors |
l1self: | Within-author clustering, this is the size of the cluster containing this pipe at level 1 (only computed for the most prolific authors) |
l2self: | Within-author clustering, this is the size of the cluster containing this pipe at level 2 (only computed for the most prolific authors) |
l3self: | Within-author clustering, this is the size of the cluster containing this pipe at level 3 (only computed for the most prolific authors) |
l4self: | Within-author clustering, this is the size of the cluster containing this pipe at level 4 (only computed for the most prolific authors) |
l5self: | Within-author clustering, this is the size of the cluster containing this pipe at level 5 (only computed for the most prolific authors) |
l6self: | Within-author clustering, this is the size of the cluster containing this pipe at level 6 (only computed for the most prolific authors) |
l7self: | Within-author clustering, this is the size of the cluster containing this pipe at level 7 (only computed for the most prolific authors) |
clusteredself: | Within-author clustering, this is the minimum level at which this pipe joined at least one other pipe in a cluster (only computed for the most prolific authors) |
We perform a rolling diversity analysis within each of the most prolific authors, as described in Section VII: Analysis of the Most Prolific Authors. The data for Figure 2 and Tables X, XI comes from the rolling cluster analysis. For each author, we sort the pipes they created by date, and then compare the level at which each pipe was clusterd considering all the pipes created before it. Each row in the data set contains the following information: