Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Detection and Analysis of First Appearances of the Scholarly Bibliographic References on Wikipedia Articles / 20230718

Detection and Analysis of First Appearances of the Scholarly Bibliographic References on Wikipedia Articles / 20230718

Presentation slide at 2nd AP-iNext workshop Scholarly Communication & Scholarly Data Mining

Jiro Kikkawa

July 18, 2023
Tweet

More Decks by Jiro Kikkawa

Other Decks in Research

Transcript

  1. 2nd AP-iNext workshop
    Scholarly Communication & Scholarly Data Mining
    Jiro Kikkawa
    [email protected]
    Detection and Analysis of First
    Appearances of the Scholarly
    Bibliographic References on
    Wikipedia Articles
    University of Tsukuba, Japan
    1

    View Slide

  2. 2
    About Me: Jiro Kikkawa / ٢઒ ࣍࿠
    • Assistant Professor at the University of Tsukuba
    – Institute of Library, Information and Media Science
    • Ph.D. (Library Information Science)
    – received from the University of Tsukuba in March 2021
    • Research interests
    – Scholarly communication, Bibliometrics, and Digital library
    – I have been analyzing scholarly bibliographic references
    on Wikipedia since I was a graduate student.
    • For more details, please visit https://researchmap.jp/jir_o?lang=en

    View Slide

  3. 3
    Overview
    • I introduce my research project to identify and analyze
    scholarly bibliographic references on Wikipedia
    – based on the following two papers
    www.nature.com/scientificdata
    Dataset of first appearances of the
    scholarly bibliographic references
    on Wikipedia articles
    Jiro Kikkawa
     ✉
    , Masao Takaku & Fuyuki Yoshikane
    Referencing scholarly documents as information sources on Wikipedia is important because it supports
    or improves the quality of Wikipedia content. Several studies have been conducted regarding scholarly
    references on Wikipedia; however, little is known of the editors and their edits contributing to add
    the scholarly references on Wikipedia. In this study, we develop a methodology to detect the oldest
    scholarly reference added to Wikipedia articles by which a certain paper is uniquely identifiable as the
    “first appearance of the scholarly reference.” We identified the first appearances of 923,894 scholarly
    references (611,119 unique DOIs) in 180,795 unique pages on English Wikipedia as of March 1, 2017
    and stored them in the dataset. Moreover, we assessed the precision of the dataset, which was highly
    precise regardless of the research field. Finally, we demonstrate the potential of our dataset. This
    dataset is unique and attracts those who are interested in how the scholarly references on Wikipedia
    grew and which editors added them.
    Background & Summary
    Along with the digitization of scholarly communication, numerous scholarly documents have been referenced
    and used on the Web. One of the changes arising from the development and dissemination of scholarly infor-
    mation infrastructures on the Web is the utilization of scholarly documents by various people and communities,
    including readers other than traditional ones such as researchers and specialists. As such an example, there are
    many references and accesses to scholarly documents via Wikipedia. In particular, according to Crossref, which
    assigns Digital Object Identi ers (DOIs) to scholarly documents massively, Wikipedia is one of the largest refer-
    rers of Crossref DOIs as of 20151.
    Wikipedia is a free online encyclopedia that anyone can edit, and it has been one of the most visited websites
    in the world. However, owing to its collaborative nature, much criticism and discussion have emerged since its
    start with regard to the accuracy and reliability of its contents. ree core content policies exist in Wikipedia:
    “veri ability,” “neutral point of view,” and “no original research.” Referencing scholarly documents as informa-
    tion sources on Wikipedia complements these policies, as these cited sources support or improve the quality of
    Wikipedia content.
    Several studies have been conducted regarding scholarly bibliographic references on Wikipedia; however,
    most of them have focused on the scholarly document itself2–6. e methodologies in previous studies used
    DaTa DEScRIpTOR
    OpEN
    Kikkawa, Jiro; Takaku, Masao; Yoshikane, Fuyuki: "Dataset of first
    appearances of the scholarly bibliographic references on Wikipedia
    articles", Scientific Data, Vol. 9, Article No. 85, pp. 1-11, 2022.
    https://doi.org/10.1038/s41597-022-01190-z
    Time Lag Analysis of Adding Scholarly
    References to English Wikipedia
    How Rapidly Are They Added to and How Fresh
    Are They?
    Jiro Kikkawa(
    B) , Masao Takaku , and Fuyuki Yoshikane
    University of Tsukuba, Tsukuba, Ibaraki, Japan
    {jiro,masao,fuyuki}@slis.tsukuba.ac.jp
    Abstract. Referencing scholarly documents as information sources on
    Wikipedia is important because they complement and improve the qual-
    ity of Wikipedia content. However, little is known about them, such as
    how rapidly they are added and how fresh they are. To answer these ques-
    tions, we conduct a time-series analysis of adding scholarly references to
    the English Wikipedia as of October 2021. Consequently, we detect no
    tendencies in Wikipedia articles created recently to refer to more fresh
    references because the time lag between publishing the scholarly articles
    and adding references of the corresponding paper to Wikipedia articles
    has remained generally constant over the years. In contrast, tendencies
    to decrease over time in the time lag between creating Wikipedia articles
    and adding the first scholarly references are observed. The percentage of
    cases where scholarly references were added simultaneously as Wikipedia
    articles are created is found to have increased over the years, particu-
    larly since 2007–2008. This trend can be seen as a response to the policy
    changes of the Wikipedia community at that time that was adopted by
    various editors, rather than depending on massive activities by a small
    number of editors.
    Kikkawa, Jiro; Takaku, Masao; Yoshikane, Fuyuki: "Time Lag
    Analysis of Adding Scholarly References to English Wikipedia:
    How Rapidly Are They Added to and How Fresh Are They?",
    Proceedings of the 18th International Conference, iConference
    2023, Lecture Notes in Computer Science (LNCS), Vol. 13972,
    pp. 425-438, 2023. https://doi.org/10.1007/978-3-031-28032-0_33

    View Slide

  4. 4
    Background
    • Mass digitization of scholarly communication
    – Various kinds of communities and people, including non-traditional readers,
    such as researchers and specialists can utilize scholarly documents
    – Wikipedia offers numerous references and access to scholarly documents,
    and Wikipedia is one of the largest referrers of Crossref DOIs as of 2015
    • Wikipedia and Scholarly bibliographic references
    – Wikipedia is a free online encyclopedia that anyone can edit, and
    one of the most visited websites in the world
    – Much criticism and discussion have emerged since its start with regard to
    the accuracy owing to its collaborative nature
    – Scholarly bibliographic references on Wikipedia complement and
    improve the quality of Wikipedia content

    View Slide

  5. 5
    Background
    • Scholarly bibliographic references on Wikipedia
    complement and improve the quality of Wikipedia content
    Difficulties defining LIS
    "The question, 'What is library and information science?'
    does not elicit responses [...]
    Chua & Yang (2008) [10] studied papers published
    in Journal of the American Society for Information
    Science and Technology in the period 1988–1997 and
    found, among other things: "Top authors have grown in
    diversity from those being affiliated predominantly with
    library/information-related departments to include those
    from information systems management, information
    technology, business, and the humanities. […] "
    References
    1. Bates, M.J. and Maack, M.N. (eds.). (2010).
    Encyclopedia of Library and Information Sciences.
    Vol. 1–7. CRC Press, Boca Raton, USA. Also
    available as an electronic source.
    […]
    10. Chua, Alton Y.K.; Yang, Christopher C.
    (November 2008). "The shift towards multi-
    disciplinarity in information science".
    Journal of the American Society for Information
    Science and Technology. 59 (13): 2156–
    2170. doi:10.1002/asi.20929.
    Figure 1. Example of the scholarly reference on English Wikipedia.
    Library and information science - Wikipedia https://en.wikipedia.org/wiki/Library_and_information_science
    • 1,474,375 scholarly references on English Wikipedia as of October 2021
    • Who added these references to Wikipedia, and when?

    View Slide

  6. • Most previous studies have focused on the scholarly document
    itself, and little is known about the editors and their contributions
    to adding scholarly references to Wikipedia.
    1. whether the scholarly articles published in high-impact factor journals
    tend to be more referenced on Wikipedia [Nielsen, 2007; Teplitskiy, 2016]
    2. whether the scholarly articles published in open access journals
    tend to be more referenced on Wikipedia [Teplitskiy, 2016; Lin and Fenner, 2014;
    Pooladian and Borrego, 2017]
    3. whether the references on Wikipedia are usable as a data source
    for research evaluations [Kousha and Thelwall, 2017]
    4. investigations regarding the characteristics of Wikipedia articles with
    scholarly references [Pooladian and Borrego, 2017]
    5. investigations regarding the references focused on specific identifiers
    (e.g., DOI, arXiv, ISSN, and ISBN) [Kikkawa, 2016; Kikkawa, 2020b; Halfaker and
    Taraborelli, 2019] or research fields [Thelwall, 2016; Pooladian and Borrego, 2017]
    Previous studies focused on the scholarly document itself
    Analysis of scholarly references on Wikipedia 6
    Reference: Kikkawa, Jiro; Takaku, Masao; Yoshikane, Fuyuki: "Time Lag Analysis of Adding Scholarly References to English Wikipedia: How Rapidly Are
    They Added to and How Fresh Are They?", Proceedings of the 18th International Conference, iConference 2023, Lecture Notes in Computer Science
    (LNCS), Vol. 13972, pp. 425-438, 2023. https://doi.org/10.1007/978-3-031-28032-0_33

    View Slide

  7. 7
    Difficulties to detect the first appearance #1
    Reference: Kikkawa, Jiro; Takaku, Masao; Yoshikane, Fuyuki: "Dataset of first appearances of the scholarly bibliographic references on
    Wikipedia articles", Scientific Data, Vol. 9, Article No. 85, pp. 1-11, 2022. https://doi.org/10.1038/s41597-022-01190-z
    • We define the term “first appearance of the scholarly reference” as
    - the oldest scholarly reference added to Wikipedia articles by which a certain
    paper is uniquely identifiable
    • We do not consider the roles of each reference
    - For instance, references as evidence for a certain part of content of the article,
    those just mentioning the paper, and those listed in further readings is not
    distinguished.
    • If there are multiple references corresponding to the same paper on the
    same article, the oldest one is treated as the first appearance.
    • The most challenging part is that the scholarly reference at the time of its
    first appearance is composed of insufficient or incomplete information, and
    more detailed information is added in later revisions.

    View Slide

  8. 8
    Difficulties to detect the first appearance #2
    Reference: Kikkawa, Jiro; Takaku, Masao; Yoshikane, Fuyuki: "Dataset of first appearances of the scholarly bibliographic references on
    Wikipedia articles", Scientific Data, Vol. 9, Article No. 85, pp. 1-11, 2022. https://doi.org/10.1038/s41597-022-01190-z
    • We define the term “first appearance of the scholarly reference” as
    - the oldest scholarly reference added to Wikipedia articles by which a certain
    paper is uniquely identifiable
    Figure 2A. The first appearance of the target papers on the article “Fair trade” on English Wikipedia
    • First appearance in this case is A1, an editor had added the corresponding scholarly
    reference including author name, published year, paper title, and journal name to the article
    • Then, another editor modified its format according to the citation template on A2, and
    DOI was added on A3
    • We need to detect the first appearance by matching paper titles for this case.
    19
    Article title Fair trade
    Target paper
    Reed, D. (2009). What do Corporations have to do with Fair Trade? Positive and Normative Analysis from a Value Chain Perspective.
    Journal of Business Ethics, 86, 3–26. https://doi.org/10.1007/s10551-008-9757-5
    Sample Number - A1 A2 A3
    Revision timestamp - 2011-05-05 13:35:01 UTC 2016-06-26 09:48:41 UTC 2016-06-26 09:49:39 UTC
    Corresponding
    Scholarly reference
    on the article
    (not exist)
    * Reed, D. (2009). What do
    Corporations have to do with Fair
    Trade? Positive and normative
    analysis from a value chain
    perspective. Journal of Business
    Ethics , 86:3-26, , p. 12)
    {{cite journal | last1 = Reed
    | first1 = D | year = 2009 | title = What
    do Corporations have to do with Fair
    Trade? Positive and normative
    analysis from a value chain
    perspective | url = | journal = Journal
    of Business Ethics | volume = 86
    | issue = | pages = 3–26 [12] }}
    […] {{cite journal | last1 = Reed
    | first1 = D | year = 2009 | title = What do
    Corporations have to do with Fair Trade?
    Positive and normative analysis from a
    value chain perspective | url =
    | journal = Journal of Business Ethics
    | volume = 86 | issue = | pages = 3–26
    [21] | doi=10.1007/s10551-008-9757-
    5}}
    Article title Solomon Islands
    Target paper
    Norton, H. L., Friedlaender, J. S., Merriwether, D. A., Koki, G., Mgone, C. S., & Shriver, M. D. (2006). Skin and hair pigmentation variation
    in Island Melanesia. American Journal of Physical Anthropology, 130 (2), 254–268. https://doi.org/10.1002/ajpa.20343
    Sample number - B1 B2 B3
    Revision timestamp - 2014-11-19 19:36:09 UTC 2014-11-19 22:23:48 UTC 2015-03-29 08:18:34 UTC
    Corresponding
    scholarly reference
    on the article
    (not exist)
    http://www.ncbi.nlm.nih.
    gov/pubmed/16374866
    {{cite web |
    url=http://www.ncbi.nlm.nih.gov/pu
    bmed/16374866 | title=Skin and hair
    pigmentation variation in Island
    Melanesia. | author=Norton HL , et
    al. | publisher= | accessdate=19
    November 2014}}
    {{cite journal | last1=Norton HL
    | first1=et al | title=Skin and Hair
    Pigmentation Variation in Island
    Melanesia. | journal=MedLine
    | date=June 2006 | volume=130
    | issue=2 | page=254 | accessdate=4
    December 2014
    | doi=10.1002/ajpa.20343}}

    View Slide

  9. 9
    Difficulties to detect the first appearance #3
    Reference: Kikkawa, Jiro; Takaku, Masao; Yoshikane, Fuyuki: "Dataset of first appearances of the scholarly bibliographic references on
    Wikipedia articles", Scientific Data, Vol. 9, Article No. 85, pp. 1-11, 2022. https://doi.org/10.1038/s41597-022-01190-z
    Figure 2B. The first appearance of the target papers on the article “Solomon Islands” on English Wikipedia
    • First appearance in this case is B1, an editor initially added just the URI with PubMed ID
    (PMID) to this article.
    • Then, the paper title and author names for the paper were added along with modification of
    the format according to the citation template on B2
    • Additional information including DOI was added on B3
    • We need to detect the first appearance by matching PubMed IDs for this case.
    Corresponding
    Scholarly reference
    on the article
    (not exist)
    Trade? Positive and normative
    analysis from a value chain
    perspective. Journal of Business
    Ethics , 86:3-26, , p. 12)
    Trade? Positive and normative
    analysis from a value chain
    perspective | url = | journal = Journal
    of Business Ethics | volume = 86
    | issue = | pages = 3–26 [12] }}
    Positive and normative analysis from a
    value chain perspective | url =
    | journal = Journal of Business Ethics
    | volume = 86 | issue = | pages = 3–26
    [21] | doi=10.1007/s10551-008-9757-
    5}}
    Article title Solomon Islands
    Target paper
    Norton, H. L., Friedlaender, J. S., Merriwether, D. A., Koki, G., Mgone, C. S., & Shriver, M. D. (2006). Skin and hair pigmentation variation
    in Island Melanesia. American Journal of Physical Anthropology, 130 (2), 254–268. https://doi.org/10.1002/ajpa.20343
    Sample number - B1 B2 B3
    Revision timestamp - 2014-11-19 19:36:09 UTC 2014-11-19 22:23:48 UTC 2015-03-29 08:18:34 UTC
    Corresponding
    scholarly reference
    on the article
    (not exist)
    http://www.ncbi.nlm.nih.
    gov/pubmed/16374866
    {{cite web |
    url=http://www.ncbi.nlm.nih.gov/pu
    bmed/16374866 | title=Skin and hair
    pigmentation variation in Island
    Melanesia. | author=Norton HL , et
    al. | publisher= | accessdate=19
    November 2014}}
    {{cite journal | last1=Norton HL
    | first1=et al | title=Skin and Hair
    Pigmentation Variation in Island
    Melanesia. | journal=MedLine
    | date=June 2006 | volume=130
    | issue=2 | page=254 | accessdate=4
    December 2014
    | doi=10.1002/ajpa.20343}}

    View Slide

  10. 10
    Proposed method to detect the first appearances #1
    Reference: Kikkawa, Jiro; Takaku, Masao; Yoshikane, Fuyuki: "Dataset of first appearances of the scholarly bibliographic references on
    Wikipedia articles", Scientific Data, Vol. 9, Article No. 85, pp. 1-11, 2022. https://doi.org/10.1038/s41597-022-01190-z
    1. We extracted DOI links referenced in main namespace articles along with their article
    IDs and article titles on English Wikipedia by using Wikipedia dump files
    2. We obtained Crossref metadata for each DOI via the Crossref REST API
    3. We obtained other identifiers such as PubMed (PMID & PMCID) and other identifiers
    corresponding to each DOI by using Entrez Programming Utilities, etc.
    4. We stored article IDs, article titles, DOIs, and other identifiers; Crossref metadata;
    and research fields for each reference as the basic dataset.
    Figure 3A. Data creation workflows. (1) Building the basic dataset
    Wikipedia
    Dump Data
    DOI Crossref Metadata
    Paper title
    Other Identifiers
    ISSN
    Research Field
    Article ID
    Basic
    Dataset
    Basic
    Dataset
    Article ID Wikipedia
    Dump Data
    Identifiers
    Paper title
    First appearance of
    the paper on the
    article
    Revision 1
    Revision 2
    Revision n

    Revisions First
    Appearances
    Dataset

    View Slide

  11. 11
    Proposed method to detect the first appearances #2
    Figure 3B. Data creation workflows. (2) Building the first appearance dataset
    1. We extracted all revision histories corresponding to article IDs in the basic dataset,
    together with article texts by using Wikipedia Dump files.
    2. We extracted identifiers and paper titles from the basic dataset, and detected the
    candidates of the first appearance for each scholarly reference as follows:
    A) One or more identifiers included in the article text.
    B) Either the full title of the paper or the first 5 words of the title is included in the article text.
    C) The similarity score based on the edit distance between the two paper titles from the
    basic dataset and from the extracted citation on the article is equal to or lower than the
    given threshold.
    3. We selected the oldest revision among the candidates as the first appearance.
    Paper title
    ISSN
    Basic
    Dataset
    Article ID Wikipedia
    Dump Data
    Identifiers
    Paper title
    First appearance of
    the paper on the
    article
    Revision 1
    Revision 2
    Revision n

    Revisions First
    Appearances
    Dataset

    View Slide

  12. 12
    Dataset of first appearances on English Wikipedia articles
    as of 1 October 2021
    Reference: Kikkawa, Jiro; Takaku, Masao; Yoshikane, Fuyuki: “Dataset of first appearances of the scholarly bibliographic references
    on English Wikipedia articles as of 1 March 2017 and as of 1 October 2021”. Zenodo (2021). https://doi.org/10.5281/zenodo.5595573
    • By using the proposed method, we built and published the dataset of first
    appearances of scholarly bibliographic references on English Wikipedia as of
    1 October 2021. We identified the first appearances of 1,474,375 scholarly
    references (1,010,834 unique DOIs) in 313,240 unique articles
    • We evaluated the precision for detecting the first appearance, which was
    93.3% as a whole and exceeded 90% in 20 out of 22 ESI research fields.
    • Please play with this dataset :D

    View Slide

  13. 13
    2023/07/17 5:14 plot_by_editor_type_yymm_enwiki2021-10-01.html
    2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
    0
    20,000
    40,000
    60,000
    10,000
    30,000
    50,000
    70,000
    5,000
    15,000
    25,000
    35,000
    45,000
    55,000
    65,000 User
    Bot
    IP
    Figure 4. Monthly plot of the time-series transitions for the total number of references added on
    English Wikipedia articles.
    Example of analysis of this dataset #1
    A
    B C
    • The spikes seen at A, B, and C in Figure 4 are caused by activities of a certain editor
    • A and B: ProteinBoxBot, the bot editor adds scholarly references related to
    molecular and cellular biology automatically at a large scale.
    https://en.wikipedia.org/wiki/User:ProteinBoxBot
    • C: Yeast2Hybrid, a human editor and is PhD, Bioinformatician, France,
    according to his profile page. https://en.wikipedia.org/wiki/User:Yeast2Hybrid

    View Slide

  14. 14
    Example of analysis of this dataset #2-1
    Reference: Kikkawa, Jiro; Takaku, Masao; Yoshikane, Fuyuki: "Time Lag Analysis of Adding Scholarly References to English Wikipedia: How Rapidly Are
    They Added to and How Fresh Are They?", Proceedings of the 18th International Conference, iConference 2023, Lecture Notes in Computer Science
    (LNCS), Vol. 13972, pp. 425-438, 2023. https://doi.org/10.1007/978-3-031-28032-0_33
    Time lag between the creation date of each Wikipedia article and the
    date of adding the first scholarly reference to the corresponding article
    The date of a certain
    Wikipedia article
    created
    The date of adding the
    first scholarly reference
    to this article
    Time lag
    Wikipedia aritcle The date of a certain Wikipedia
    article created
    The date of adding the first
    scholarly reference to this article
    Time lag
    Spyware 2001-11-22 16:37:56 UTC 2016-08-06 16:05:57 UTC 5370.98 days (464,052,481
    seconds ≒ 14.7 years)

    View Slide

  15. 15
    Example of analysis of this dataset #2-2
    Reference: Kikkawa, Jiro; Takaku, Masao; Yoshikane, Fuyuki: "Time Lag Analysis of Adding Scholarly References to English Wikipedia: How Rapidly Are
    They Added to and How Fresh Are They?", Proceedings of the 18th International Conference, iConference 2023, Lecture Notes in Computer Science
    (LNCS), Vol. 13972, pp. 425-438, 2023. https://doi.org/10.1007/978-3-031-28032-0_33
    A. 0 days and at the same time
    B. 0 days but not at the same time
    C. less than 1 month
    D. equal to or more than 1 month but less than 6 months
    E. equal to or more than 6 months but less than 1 year
    F. equal to or more than 1 year but less than 3 years
    G. equal to or more than 3 years but less than 5 years
    H. equal to or more than 5 years
    2022/06/25 18:27 timelag_add_between_page_created_and_first_ref_added.html
    0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
    Figure 5. Distribution of the time lag between creating the Wikipedia articles and adding the first scholarly references for every
    2 years.
    • Regarding the
    group of “0 days
    and at the same
    time,” the
    percentage
    increased
    significantly from
    2005–2006 to
    2007–2008 (from
    9.05% to 36.00%).
    Grouped by the years of Wikipedia articles created
    Time lag between the creation date of each Wikipedia article and the date
    of adding the first scholarly reference to the corresponding article

    View Slide

  16. 16
    Example of analysis of this dataset #2-3
    Reference: Kikkawa, Jiro; Takaku, Masao; Yoshikane, Fuyuki: "Time Lag Analysis of Adding Scholarly References to English Wikipedia: How Rapidly Are
    They Added to and How Fresh Are They?", Proceedings of the 18th International Conference, iConference 2023, Lecture Notes in Computer Science
    (LNCS), Vol. 13972, pp. 425-438, 2023. https://doi.org/10.1007/978-3-031-28032-0_33
    Time lag between the creation date of each Wikipedia article and the date
    of adding the first scholarly reference to the corresponding article
    A. 0 days and at the same time
    B. 0 days but not at the same time
    C. less than 1 month
    D. equal to or more than 1 month but less than 6 months
    E. equal to or more than 6 months but less than 1 year
    F. equal to or more than 1 year but less than 3 years
    G. equal to or more than 3 years but less than 5 years
    H. equal to or more than 5 years
    2022/06/25 18:27 timelag_add_between_page_created_and_first_ref_added.html
    0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
    A. 0 days and at the same time
    B. 0 days but not at the same time
    C. less than 1 month
    D. equal to or more than 1 month but less than 6 months
    E. equal to or more than 6 months but less than 1 year
    F. equal to or more than 1 year but less than 3 years
    G. equal to or more than 3 years but less than 5 years
    H. equal to or more than 5 years
    file:///Users/mona26/Dropbox/working/wikipedia_timelag2022/pageid_and_oldest_ref/highchart/timelag_add_between_page_created_and_first_ref_added.html 1/1
    0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
    • In 2005, a hoax stating that a certain journalist had been a suspect in the
    assassinations of the president of the USA was added to the Wikipedia article,
    which became a social problem.
    • Wikipedia Seigenthaler biography incident - Wikipedia
    https://en.wikipedia.org/wiki/Wikipedia_Seigenthaler_biography_incident
    • In 2006, Jimmy Wales declared that the Wikipedia community has traded in quantity
    for the quality of its contents.
    • The increase observed here could be seen as a response to this movement.

    View Slide

  17. 17
    Future directions of this project
    • Achievements
    ü Building the methodology to detect first appearances of scholarly
    bibliographic references on Wikipedia articles with a high precision
    ü The dataset of English Wikipedia as of 2021 October
    ü Time lag analysis of adding scholarly references to English Wikipedia
    • Future works
    – Classify each reference based on their roles such as evidence for a
    certain part of content of the article, those just mentioning the paper,
    and those listed in further readings.
    – Support adding more scholarly references by building
    recommendation system that shows related scholarly articles to
    Wikipedia editors.
    – Support to detect and update obsolete/problematic references such
    as references to retracted papers.

    View Slide

  18. 2nd AP-iNext workshop
    Scholarly Communication & Scholarly Data Mining
    Jiro Kikkawa
    [email protected]
    Detection and Analysis of First
    Appearances of the Scholarly
    Bibliographic References on
    Wikipedia Articles
    University of Tsukuba, Japan
    18

    View Slide