Case | HBS Case Collection | November 2008 (Revised November 2008)
Cyworld: Creating and Capturing Value in a Social Network
by Sunil Gupta and Sangman Han
In May 2008, the new CEO of Cyworld, a social network company in Korea, had to decide how to create and capture value from his rapidly growing user base. Cyworld was founded in 1999, and in 2003 it was acquired by SK Telecom, a leading mobile service provider in Korea. By 2007, Cyworld had 21 million users and $95 million revenue-$65 million from paid items (music, virtual gifts, etc.), $15 million from mobile networking, and $15 million from advertising. The new CEO had to decide which of these three revenue sources he should focus on in the future and how this choice would influence the target customers, the service offerings, and the required capabilities.
Keywords: Customer Value and Value Chain; Consumer Behavior; Social and Collaborative Networks; Segmentation; Value Creation; South Korea;
Social networking services have the potential to mediate, complement, and transform the changing nature of political participation by providing an insight into the perceptions of the electorate as they observe and comment on those who represent them politically. The rise in prominence of social networking services has been observed for some time (Boyd & Ellison, 2007), and this, coupled with online engagement relevant to political debate, suggests that social networking has the potential to influence the political landscape (Elmer et al., 2007). However, it is only within the last decade that online social networks have become technically mature and socially established enough to play a noteworthy role in disseminating information and encouraging citizens to remain engaged in political activities (Park & Kluver, 2009). Williamson (2009) argues that this new campaign strategy has benefits for politicians, as it provides a broad assessment of public opinion and offers both politicians and constituents alike a contemporary method to communicate.
Leveraging the existing use and understanding of Web resources has ensured that hyperlinks have found a place beyond online curation as communication tools in social network services. Turow and Tsui (2008) postulated that hyperlinks within Web-mediated communication technology facilitate social networking among people, organizations, and nation-states. There are two ways in which hyperlinks assist users in moving from personal communities to broader social networks. Hyperlinks can act as communication channels; connecting people as they surf from one site to another and, according to Adamic and Adar (2003), establish links between homepages in the promotion of social/political agendas, events, and issues. Constructing hyperlinks on public Web spaces, such as visitor boards, has been used to draw attention to information that is not widely diffused but potentially note-worthy, and such an action can receive immediate and intensive response (Halavais, 2006). Warnick, Xenos, Endres, and Gastil (2005) state that the existence of hyperlinks permits a deeper cognitive engagement to peripheral aspects of the page while the decision to follow the hyperlink is considered. Moreover, links to external services, in practice, often include explanatory messages and Fogg and Iizawa (2008) have emphasized that the networking ability of hypertext can be enhanced if augmented with contextual information. Therefore, hyperlinks can be seen to have evolved beyond the exclusively practical mechanism to navigate online content into a mechanism to inform and build relationships that can be utilized by end users with limited knowledge of the underlying protocols and infrastructure (Karan, Gimeno, & Tandoc, 2009).
Thelwall (2003) demonstrated that hyperlinks can lead to new sources of information, however Park and Jankowski (2008) argue that a link is not a single construct but instead can facilitate separate activities with distinct implications for communication. Furthermore, Ackland, Gibson, Lusoli, and Ward (2010) formalized this definition and identified five principal social functions that hyperlinks can be said to perform: 1) Information Provision, 2) Network Strengthening, 3) Identity Building, 4) Audience Sharing, and 5) Message Amplification. Ackland et al. (2010) defined Information Provision as the practice of using hyperlinks to provide new sources of information, whereas Network Strengthening has been used to denote the establishment of linking actors within a network or the formation of new bonds. An Identity Building action is seen as recognizing the work of others and signaling endorsement, with Audience Sharing being performed to ease the transit of users through the Web (Ackland et al., 2010). Finally, Ackland et al. (2010) found that the increased use of messages by a vocal few elevated the presence of marginal political groups with disproportionately strong online support, and Message Amplification aims to model this behavior.
Link analysis in the context of social and political science has yielded insights into online communities of political actors. The findings of Soon and Kluver (2007) lend credence to the argument that the configuration of links within political webospheres can reveal the ideological landscape of parties and advocacy organizations, visualizing the structure of alliances among those who share similar interests, beliefs, or agendas. Focusing on the nature of hyperlinks embedded within individual politician or party sites, Foot and Schneider (2006), while examining the role of hyperlinks on campaign websites, found that both political parties and politicians have used hyperlinks to signal their endorsement of political issues. Similarly, Shumate and Dewitt (2008) discovered that hyperlinks were used to establish associations among nongovernmental organizations, and bipolar clusters were observed which illustrate the existence of the North–south divide. Hyperlinks therefore can be seen to represent an intentional communicative choice particularly in the context of politically motivated services. For example, Biddix and Park (2008) examined a campus movement using hyperlink analysis and the pattern of link connectivity among student organizations indicated that interaction was replicated offline. In this context, political-based hyperlinks have been regarded as a public acknowledgement of others and can reflect existing bonds or facilitate in the construction of new networks.
Park, Thelwall, and Kluver (2005) highlighted the form of linking that occurs within a political environment online; comprised of outlinks and inlinks to candidate websites. Outlinks are susceptible to change over time, reflecting shifting political allegiances between parties and organizations (Foot, Schneider, Dougherty, Xenos, & Larsen, 2003). Foot et al. (2003) observed that network choices differ according to political-philosophy, with left-leaning parties showing an affinity for international services whereas parties identified as coming from the right exhibiting a marked preference for domestic websites. Herold (2009) goes further by explaining how online services in Asia provide a platform that compliments traditional sociocultural norms of participation and offer a means to engage constituents and the wider population, allowing political figures to enhance reputation and elevate prominence. Lee and Park (2010) use the example of Korean politician Geun-Hye Park to demonstrate this phenomenon. Geun-Hye Park was well regarded before the growth of online network services but the use of social media gave supporters an accessible platform, and the number of inlinks to her online presence steadily increased when standing as a presidential candidate in 2007 (Lee & Park, 2010).
However, the motivation to share links and amplify content is not limited to admiration. The appearance of Nick Griffin, chairman of the far-right British National Party, on the BBC's Question Time show in 2010 provoked an immediate response from users on microblogging service Twitter who found the decision to include the BNP figurehead on a prominent program questionable (Anstead & O'Loughlin, 2011). External events such as this have been found to increase online activism and, in the case of South Korea, this was demonstrated during the candlelight protests of May and June 2008 where the reintroduction of American beef imports potentially infected with Bovine Spongiform Encephalopathy (BSE) prompted protests to demand the cessation of a free trade agreement with the United States (Lee, Kim, & Wainwright, 2010). During this time online communities experienced increased usage from citizens commenting on political profile pages to voice their concerns (Park, Lim, Sams, Nam, & Park, 2011), and the fallout from the candlelight protests, and other contentious issues in South Korea, has shown how online networks can mobilize citizens. The use of online environments allow individuals disillusioned with political participation to have the opportunity to engage and influence public debate, and such online activism has the potential to play a part in shaping the development of government policy. Moreover, Ackland et al. (2010) argued that political messages containing hyperlinks can be greatly amplified, and that heavy posters have the ability to magnify political issues and further exaggerate presence to other users. Panagiotopoulos, Sams, Elliman, and Fitzgerald (2011) found that this usage by a vocal minority in both official and unofficial channels elevated the prominence of latent concerns that appear incongruous with mainstream opinion and are, often, characterized by calls for excessive retribution. In a similar vein, Utz (2009) proposes that social networking services facilitate participatory democracy outside of official election periods, and largely unconstrained by budgetary concerns.
The prominence of using hyperlinks and related statements to convey meaning in online message dialogs, particularly those with a political leaning, has been found to infer behavior (Karen et al., 2009). Whilst relying solely on those comments that contain hyperlinks appears to unduly limit the available content to a small subset of the full sample, Park et al. (2005) found that the target of links may indicate trends within the broader sample of potential data. Link choices are rarely randomly constructed, particularly in the context of politically motivated services, and external events have been found to motivate users into sharing links to provide supporting information (Robertson, Vatrapu, & Medina, 2010). Moreover, the intention of the link when contributed to a political message board can, as Ackland et al. (2010) argued, be amplified. Access to such data therefore has the potential to offer a broader understanding of the impact of specific campaign strategies, to track the political process as it unfolds, and provide a representative sample to examine how technologies can mediate democratic processes.
The study will examine the target of hyperlinks contained in online public dialogs to discover the services and issues that have gained prominence. The location of links will be determined to reveal the extent to which hyperlinks include international services. The World Wide Web is often discussed as a global resource but the degree to which citizens engaging in national politics utilize this resource remain unclear. Data collection will be achieved by developing a software program to query social network message boards and retrieve the comment and details of the user in question. The political background of the recipient of each comment will be determined to discover which user groups engage with which parties. The purpose of link choices will be evaluated based on established criteria and, following this, machine-based learning algorithms will be applied to the sample of user-submissions to classify the textual comment that typically accompanies hyperlinks in online public dialogs.
The social network profile pages of 130 Korean National Assembly Members were identified and the date parameters of the study were April 2008 – June 2009. April 2008 was chosen as a suitable start date as this reflected the most recent election and appointment of National Assembly Members. The profile pages were maintained on Korea's most prominent social networking service, Cyworld.com (Kim & Yun, 2007).
To determine the type and content of links in user comments, all submissions posted to the sample of 130 politicians on Cyworld were collected using a Java-based eResearch tool. A HTTP call is made to request a single page of comments from a politician's visitor board. Comments on Cyworld are paginated and each visitor page contains a maximum of five submissions. A HTML page is returned and the content and date are isolated and held in temporary storage. The developed system then requests the next page of comments and performs the same actions, iterating until the target date parameters have been met. Following this, the content of submissions is evaluated using regular expressions to determine if a URL is present.
Data Analysis and Results
A total of 153,602 comments were collected, and this amount consists of 71,499 comments from males, 56,779 from females, and 25,324 from users whose gender could not be determined. It was found that 1,276 comments contained hyperlinks, consisting of 587 for males, 322 for females, and 367 where the user's gender was unknown. The link count was higher at 1,920 and reflects instances of multiple URLs within individual postings.
The links comprised of 762 unique URLs and 259 corresponding domains. These domains were manually categorized into website type, such as portals, media, party and politician homepages, petition sites, online fan clubs, and NGOs.
The sample of 1,920 hyperlinks was analyzed using LexiURL Searcher (Thelwall, 2009; Park, 2010) to generate a standard Webometric report. Ten of the most frequently occurring domains can be seen in Table 1. The 10 domains listed in Table 1 represent 24.5% of the total hyperlinks found; the remaining domains have been omitted for brevity.
The list of 10 prominent domains, by link count, in the sample is comprised entirely of domestic Korean services. The three main portals, Daum, Nate, and Naver, are all represented to some degree, as is smaller portal service paran.com. Tistory.com, a community blogging service operated by Daum, was found to occur frequently as was internal links to other profile pages on Cyworld. The four remaining domains in the sample represent the dominant online news providers in Korea. Donga.com is the online version of the long-running daily newspaper Dong-a Ilbo. Likewise, hani.co.kr is a service operated by Korean newspaper Hankyoreh, chosun.com is owned by the Chosun Ilbo, and joins.com is a website from the JoongAng Ilbo newspaper.
Table 2 shows the distribution of top-level domains (TLDs) reported by LexiURL within the sample of captured domains. Nearly half of links were targetedto.com sites and approximately one-third were found to contain a .kr suffix, the country code top-level domain (ccTLD) that denotes a Korean service. Overseas ccTLDs (.cn for China, .us for USA, .my for Malaysia) and other generic top-level domains (gTLDs), such as .netand.org, accounted for the remaining 20.85% of links and 54 domains.
Generic top-level domains are ubiquitous in Korea and therefore the previous example that examined the visible URL in isolation does not confidently denote the physical location of the service that each link refers to. Barnett, Chung, and Park (2010) highlighted the lack of coverage of network analysis based on gTLDs and indicate that an analysis of ccTLDs is the more common scenario despite the prevalence of gTLDs in Korea and elsewhere.
To reduce the ambiguity created by the high usage of gTLDs for national services, a network query tool was developed to perform a DNS (Domain Name System) lookup to resolve the domain into a public Internet Protocol (IP) address, which is then forwarded to an IP locator service to reveal the physical location of the server that is mapped to the IP address in question. This approach will locate the country of the server where the website is hosted, however the company and server could be located in different countries. To mitigate this limitation, details of the owner of a domain name are accessible through public WHOIS libraries and a network client was developed to query one such service to determine the country where the company or individual who registered the domain name is residing. The WHOIS query and IP country lookup fields containing the two forms of location information are included in Table 3. Typically, these two fields are identical as most large organizations host their service onsite, however there are exceptions to this observation. For example, ko.wikipedia.org, which appeared in the results, is a subdomain of Wikipedia and registered in the USA however the physical location of the server was found to be in The Netherlands. Whilst Table 3 indicates that Korean services are in the majority, only two of the domains listed contained the Korean ccTLD. The remainder utilize gTLDs, suchas.com, and this finding may go some way to address the results in the previous section that assigned.com domains as the most frequently occurring.
A total of 1,849 URLs encountered in the sample were found to belong to services based in Korea, and this represents the majority of links as a whole. The predominate site being linked to was a petition service (agora.media.daum.net), and this finding is consistent with similar research that found linking to petitions was a key function of political action within social network services in other nations (Panagiotopoulos et al., 2011). The 210 links to a petition service were found to point to 30 separate petitions, and of these links 152 are associated with just two petitions. The most prominent, with 116 links, was a petition to impeach the governing president Lee Myung-Bak. The second most prominent, with 36 links, was a petition to encourage a government official to resign following allegations of stalking.
In addition to Daum's agora petition service, blog and café (forum) services operated by Daum were found in the results. The 51 hyperlinks to a Daum-hosted café comprised of 25 unique URLs. The most prominent, with 12 links, was a thread for venting negative sentiment towards President Lee. Seven links were found to point to a thread with support for Kang Ki Kab, a popular National Assembly member who is outspoken in his criticism for American beef imports and the continued military presence on the Korean peninsula.
Twenty-six blogs hosted on Daum were present from the sum of 69 links discovered. A memorialized page previously managed by late ex-president Roh Moo-Hyun accounted for 36 links and represents more than half of the total. A blog managed by the Creative Korea Party on Daum's Tistory service was linked to 61 times.
Offering similar services to Daum, Naver's blog, café, and news domains were found in the results. Naver's news service accounted for 55 unique links, with the majority of news stories covering economic issues such as the government deficit and divide between living standards of the rich and poor. The 72 links to Naver's blog service were distributed between 47 blogs, whereas 76 of the 106 links to a café on Naver were found to be pointing to a single thread hosted by nonodemo.com, a service that encourages peaceful demonstrations, such as the candlelight protest, as an alternative to violent street rallies.
Links to Social Solidarity Bank (bss.or.kr), a microcredit organization, were found in 56 comments. Empowering the poor has become a prominent sociopolitical issue in Korea and, similarly, links to Social Enterprise, a government department that encourages philanthropic business ventures, were present in 49 comments. Internal links to Cyworld profiles were found in 139 comments and two of the most frequently occurring were members of the general public and not, as might be expected, politicians or entertainers. This finding may demonstrate how ordinary citizens can become visible, and in some cases more visible than public figures, when online.
The 71 links that refer to international services represent a small proportion of the 1,920 links encountered and therefore only Korean results appear in the domains listed in Table 3. To address this omission, Table 4 shows the most prominent services, by link count, that were found to be located outside Korea. The IP and WHOIS fields of Table 4 indicate that the majority of hyperlinks refer to websites in the USA. Eight other countries, Australia, Canada, China, Germany, Malaysia, The Netherlands, Singapore, and the UK, contributed just 13 links overall.
Whilst the number of international links was small in comparison to domestic services, those international links that were found appeared to have a strong Korean-emphasis. For example, the most frequent international link pointed to video-sharing service YouTube. This was comprised of 15 links and 12 unique URLs. Of these 12 links, 10 of the videos were in Korean and one was in English but the topic was the candlelight protests in Korea. The final link was pointing to a Western music video. Similarly, all 12 links to Google's Video service were found to point to a single video that reported on the issue of Mad Cow Disease in Korea.
Five hyperlinks pointing to news provider Reuters linked to a single story regarding the contamination of cattle feed and although two links to the BBC news service pointed to separate stories, the theme of Mad Cow Disease was present in both. The first reported a case of suspected vCJD (the human form of BSC) in the UK and the second covered the candlelight protests in Korea. A link to CNN's iReport citizen journalism service was found, and the story being linked was critical of the perceived heavy-handedness of the Korean security services towards demonstrators during the candlelight protests. Similarly, a Portable Document Format (PDF) file hosted by the United States Department of Agriculture detailed the meat classification scheme for cattle. Five links to this file were found and may indicate concern regarding American beef imports. One link to a story covering Korea-Japan relations was found on the online edition of the International Herald Tribune.
The personal homepage of James Won, a second generation Korean-American aiming to become the president of the Korea, was found to occur three times. Despite having a .us ccTLD, the hosting company, AwardSpace, is registered in Germany and the WHOIS field reflects this. Five hyperlinks pointed to thesixsystem.net, a social welfare initiative that aims to reduce the wage discrepancy between rich and poor. The final link, danawa.tk, is believed to be Spam.
The users who composed comments were split between those logged into Cyworld when submitting a comment and those who commented anonymously. Of these two groups, 367 were from anonymous users and 909 were from users logged into Cyworld. The existence of mandatory verification for all Cyworld accounts enabled gender information to be determined for users who submitted a comment when logged into Cyworld. Comments containing links from males were found to be more frequent than those from females (Male = 587; 64.58%, Female = 322; 35.42%), although the type of content being linked to did not vary considerably. For the full sample of collected comments where the gender could be determined (n = 128,278 in relation with N = 153,602), the bias towards male-submitted comments containing links is present though less marked (Male = 71,499; 55.74%, Female = 56,779; 44.26%). This may indicate that although commenting on political profile pages is common for both genders, the submission of hyperlinks is an activity that is practiced more by males than females.
The comments were categorized into six groups, based on the gender of the poster and the political affiliation of the politician that the message was directed to. This is summarized in Table 5. Politicians from the ruling Grand National Party (GNP) received 562 comments containing links, and therefore just under 60% of all links (in relation with the total N = 939) posted in the sample of Cyworld profile pages was intended for politicians within the GNP. Links posted to GNP politicians from males, at 54.45% (n = 306 in relation with N = 562), represent a higher proportion than both females (n = 172; 30.6%) and users whose gender was unknown (n = 84; 14.95%).
Males were also the largest gender group to submit a comment containing a link managed by an opposition party politician, with 42.71% (n = 161) of all comments containing links coming from males within this category (N = 377). Unlike the results for the ruling party, comments containing links from users whose gender was unknown was high, and represents 31.83% (n = 120) of hyperlinks posted to politicians in an opposition party (N = 377). Comments containing links for opposition politicians from females amounted to 25.46% (n = 96); the lowest proportion of gender types commenting to an opposition party (N = 377).
The previous section extracted links within the sum of comments and analyzed this in isolation. This section will examine the accompanying text that is present with most comments. The first part will determine the intention of using a link for a smaller subset of comments using established criteria. Following this, machine-learning algorithms will be used to categorize the sentiment and determine content across the sample as a whole.
To determine the intention of users posting hyperlinks, a random sample of 50 comments from each of the six user categories was generated. Two coders were employed to categorize the sample following the five link intention types proposed by Ackland et al. (2010), with an additional field to account for Spam. The coders agreed on 206 comments from the full sample of 300. The remaining comments where disagreements occurred were removed from the study to ensure intercoder reliability.
Table 6 summarizes the results. Of the five actions Ackland et al. (2010) reported that links can perform, and including an additional field for Spam, Message Amplification accounted for just over half (n = 106; 51.46%) of all link types. Hyperlinks used for Network Building (n = 58; 28.16%) were also high and together with Message Amplification represent 79.61% (n = 164) of all link types. Spam accounted for 15.53% (n = 32) of links and the remaining link types, Information Provision, Identity Building, and Audience Sharing, contributed just 4.85% (n = 10).
Comments within the opposition party sample (N = 99) were found to be highest among those containing links that were classified as performing a Network Building purpose (n = 35; 35.35%), and those identified as Message Amplification (n = 38; 38.38%). However, it was found that female Cyworld users within this sample tended to favor using hyperlinks for Network Building (n = 20; 48.78%) whereas the male opposition party sample predominately contributed links found to have a Message Amplification (n = 13; 43.33%) motive. Moreover, the sample containing the male opposition party contributed the lowest of all user groups for links identified as having a Network Building purpose and, at just four comments (4.04%), suggests an underrepresented intention for a male commenter compared to the opposition party female contribution, which was recorded at 20 comments (20.2%).
In contrast to the results for opposition parties, ruling party hyperlinks (N = 107) were primarily within the Message Application category (n = 68; 63.55%). Links from females in this category (n = 29; 42.65%) were found to be higher than those of males (n = 23; 33.82%), whereas the smallest proportion was found to be comments from users whose gender was unknown (n = 16; 23.53%). Whilst the use of hyperlinks for Network Building purposes within comments directed to the ruling party was less common than those intended as Message Application, the distinction between gender types (male n = 5; 21.74% and female n = 6; 26.09%) was less marked. However the majority of comments within this category (N = 23) were found to be from users whose gender could not be determined (n = 12; 52.17%).
Spam was higher for opposition parties (n = 19; 59.38%) than was found to be the case for the ruling party (n = 13; 40.63%). Overall, links identified as being Spam (N = 32) were higher for males (n = 15; 46.88%) than was the case for females (n = 12; 37.5%). Males contributed more Spam comments intended for the ruling party than females (Male n = 7; 53.85% and Female n = 3; 23.08%), although the difference between males (n = 8; 42.11%) and females (n = 9; 47.37%) posting Spam links to politicians in an opposition party was less divergent. Spam from users not logged into Cyworld was surprising small for both opposition (n = 2; 6.25%) and ruling (n = 3; 9.38%) party, and represents either the smallest or joint smallest of hyperlink types within opposition and ruling party samples (N = 32). Allowing users to comment anonymously was believed to encourage Spam however this, albeit limited, sample appears to suggest that Spam was more common from users whose identity could be verified.
Comments containing hyperlinks are typically accompanied with textual information and whilst the previous method to analyze links provided an indication of the intent of each message, the content and sentiment of the statement was not accounted for. To analyze a large body of text, natural-language processing (NLP) is one approach to categorization that can mitigate the problem of obtaining accurate results from large datasets that are infeasible to perform manually. In a related field, formally considered word-sense disambiguation, the grading of individual or pairs of words within a statement is performed to isolate words and determine their association with other words. The procedure for grading a statement is to tokenize the sentence into individual words, remove weak words (and, or, for example), reduce the word to its root form (known as stemming), and finally grade words through a predefined dictionary based on their affective value, such as ANEW (Stevenson, Mikels, & James, 2007) or WordNet (Miller, 1995), the sum of which contributes to the aggregate score for the statement as a whole. Grading words through a dictionary can attain high accuracies when applied to structured text, such as editorially defined tags or metadata, however the reliability of a given classification is reduced when assessing text that is less well-formed, such as user-generated content (Cui, Zhang, Liu, & Ma, 2011). There also exists a compromise between natural-language processing and word-sense disambiguation that attempts to draw attention to words that frequently collocate, and which may omit meaning when viewed in isolation. Collocations, finding tokens that appear together frequently, perform better on unclean data however the approach will only resolve specific terms and does not, by itself, infer polarity.
In contrast to grading individual or pairs of words within a statement, supervised approaches to machine-learning evaluate the context of the sentence by training a classifier on context-specific precoded text from a comparable domain with the salient properties of the corpus present. This has the benefit of constraining analysis to specific characteristics, such as polarity, but can omit characteristics outside the focus of the study. Supervised learning approaches are, when trained on similar domains, able to attain accuracy levels that make the approach a compelling option when evaluating unstructured data. Instead of measuring the association between variables, supervised learning aims to determine the strength of connection between predefined classes. Maintaining the separation between training data and unseen data is vital to ensure the classifier is predicting and not, as the case might be, overfitted to a predefined outcome. In this respect, the greater the training data the more variety and therefore reliable the classifier will be however a poorly trained classifier can still perform well providing the problem scope is limited. Given two categories to choose from, such as positive and negative, results in a constrained set of options and therefore the outcome of a false classification can be mitigated. For example, Pang and Lee (2008) examined the classification of film reviews but restricted evaluation to broad categories, such as positive or negative, as opposed to the level of granularity that typically occurs in reviews for movies, products, and services. In spite of the acknowledged limitations of automated classification solutions, the approach offers a scalable implementation that can go some way towards coding polarity in user-generated content.
A Java program was developed that wrapped a small subset of the LingPipe NLP toolkit to enable two forms of analysis: Collocation and Sentiment Analysis. The absence of an accessible lookup dictionary of the kind produced in English prohibited word grading on Korean text. Instead locating the frequently occurring words and proper nouns through an examination of collocations was used and the name of Korean President Lee Myung-Bak was found to occur 229 times although he has no Cyworld account and therefore was not being addressed directly. Similarly, the name of deceased ex-president Roh Moon-Hyun was found to occur frequently (n = 59) and this is in contrast to similar studies that examined U.S. candidate profile pages and found that name occurrences are primarily performed in addressing the profile owner (Robertson, Vatrapu, & Medina, 2009). Mad Cow Disease (n = 159), beef (n = 110), American goods (n = 59), and candlelight protest (n = 139) occurred frequently and are a reference to the candlelight protests of May and June 2008. A Korean word that translates as ‘the gap between rich and poor’ occurred frequently (n = 69) as did the statistical term ‘Gini coefficient’ (n = 138).
Unlike an examination of textual collocations, a prerequisite of performing supervised sentiment analysis is compiling a set of manually coded input files that are representative of each category to be grouped by: training data. To address an absence of positive and negative sentiment composed in Korean from the body of publicly available corpus, a sample of unclassified text was coded through the use of hired Korean students. Each student coded the same text into subjective - positive and subjective - negative categories. Where disagreements in categorization occurred the statements in question were removed from the sample.
Previous studies have indicated that the majority of user-generated comments submitted to social networking sites are subjective in tone (Thelwall & Wilkinson, 2010), and therefore the accuracy-reducing method of preclassifying into objective and subjective was not performed. Other studies have suggested that neutral comments may play a part in classification, and two approaches to achieve this are either adding a third neutral set of training data to the positive and negative texts (Ahn, Geyer, Dugan, & Millen, 2009) or relying on positive and negative training data but classifying the statement as neutral if the confidence of a classification falls below a predefined threshold (Cui, Zhang, Liu, & Ma, 2011). This approach was chosen as a suitable model to follow owing to the difficulty in identifying neutral comments on social network message boards, and therefore comments evaluated as comprising of equally positive and negative sentiment were deemed to be neutral and removed from the sample.
Figure 1 indicates that, for the most part, negative comments were in the minority for all months. However it is worth noting that this sample represents those comments that remained on the message board, and were not deleted. Legal mechanisms exist in Korea to forbid the posting of negative comments to politicians during election periods (Lee, 2009) and any subsequent deleting of negative content may result in only those viewpoints with a particular bias remaining persistent. However, despite the possibility of missing potentially deleted comments, May and June 2008 were found to have high numbers of comments containing links that showed negative sentiment, and this date corresponds with the period of the candlelight protest that has been viewed as a time of increased civil action and negative sentiment online (Park et al., 2011). May 2009 also shows a large number of comments containing hyperlinks and negative sentiment; coinciding with the harassment and suicide of ex-president Roh Moo-Hyun.
Although the initial overhead of supervised learning is greater than that for collocations, the approach has the benefit of providing a mechanism to determine the accuracy of the developed classifier. This margin of error is represented in Figure 1, and based on the average conditional probability, a confidence interval of 86.67% was achieved for the polarity model. The outcome of a confirmed dataset, a gold standard (Kim, Zhai, & Han, 2010), can be a factor in ensuring accuracy, and a 10% – 15% margin for error is seen as acceptable for the domain providing disagreements between coders are handled correctly (Reidsma & Carletta, 2008). Whilst this result appears promising and the existence of a confidence interval provides a degree of validity when evaluating the polarity of a statement, there are circumstances where sentiment analysis may overlook relevant meaning. Thelwall, Buckley, Paltoglou, Cai, and Kappas (2010) highlight this concern and draw attention to the inherent shortcomings in performing sentiment analysis on diffuse content found within social network and microblogging services. Additionally, Pang and Lee (2008) point out that hidden meaning such as sarcasm and innuendo can be overlooked through machine-learning methods, and limitations experienced as a result of language-complexity should be acknowledged when assessing the effectiveness of the approach.
This study has shown how leveraging existing technologies, such as hyperlinks, can mediate democratic processes by determining the motivation, domain choice, sentiment, and message content of users commenting on political profile pages. The motivation of users was determined using established criteria that evaluated the purpose of posting a link and ascertained how hyperlinks lead to different ways in which ideological messaging can spread through social media. The domain choice was discovered through Webometric Analysis, DNS, and WHOIS lookup to locate the services and nations that occur frequently. Finally, the content and sentiment was revealed though the use of NLP, and specifically sentiment analysis and collocation.
The findings suggest that hyperlinks are almost solely targeted to Korean services, and the few that do point to overseas sites are usually related in some way to local issues in Korea. This frames the discourse on Cyworld as one with a largely domestic political agenda and indicates that online politics is unlikely to take place across national borders, except where the topic in question involves another nation or comparable issue. Whilst political discourse was, primarily, found to rely on services based within national borders, less clear previously was how online domestic services are employed to strengthen arguments and demonstrate affinity.
Males are marginally more likely to comment on political profile pages using links than females, and those profiles managed by ruling politicians were found to be of greater prominence than those of opposition parties. Message Amplification and Network Building were found to be the dominant purpose for submitting links within user-generated comments. Using two forms of machine-based learning algorithms, sentiment analysis and collocation of significant phrases, revealed primarily negative sentiment towards President Lee and his role in the reintroduction of American beef imports. Issues surrounding the suicide of ex-President Roh suggested anger towards those who were seen to be harassing him prior to his death. In both cases, participation increased online and this reaction to external stimuli has also been reported in other nations.
Whilst extreme online reactions to perceived injustices are widely accepted, less clear previously was how links are used to amplify user comments. The findings suggest that contributing a link has the potential to complement political debate and provide a measure when evaluating discourse online that is scalable to larger datasets.
The authors are grateful to Se Jung Park and Yon Soo Lim for their feedback on earlier versions of the article. This research was supported by the World Class University (WCU) project through the National Research Foundation of Korea, funded by the Ministry of Education, Science and Technology (No. 515-82-06574). The first author acknowledges that the work in this article was conducted during his stay at the WCU Webometrics Institute.
Steven Sams is a doctoral candidate at the School of Information Systems, Computing, and Mathematics at Brunel University, and a former research fellow of WCU Webometrics Institute at Yeungnam University. His research interests are political communication, social media analytics, and online data collection strategies.
Address: Department of Media & Communication, Yeungnam University, 214–1 Dae-dong, Gyeongsan-si, Gyeongsangbuk-do, 712–749, Republic of Korea. Email: firstname.lastname@example.org
Han Woo Park (Corresponding Author) is an Associate Professor and Director of CyberEmotions Research Institute, Yeungnam University. He serves as coeditor of Journal of Contemporary Eastern Asia and has guest edited special issues of Journal of Computer-Mediated Communication, Social Science Computer Review, and Scientometrics on Computational Social Science and Webometrics. Address: Department of Media & Communication, Yeungnam University, 214–1 Dae-dong, Gyeongsan-si, Gyeongsangbuk-do, 712–749, Republic of Korea. Email: email@example.com