R Webscrapingデータセット

私はUN FAOのウェブサイト（http://www.fao.org/countryprofiles/en/）からデータセットを構築しようとしています。このページには、国へのリンクが含まれています。このリンクをクリックすると、その国のニュースが含まれている特定の国のページが表示されます。アイデアは、データセットに含まれるようになります：R Webscrapingデータセット

Country name 
Country url (e.g. <http://www.fao.org/countryprofiles/index/en/?iso3=AFG>) 
News url (e.g. <http://www.fao.org/afghanistan/news/detail-events/en/c/1045264/>) 
News title (e.g. World Food Day 2017 Celebrations in Afghanistan) 
News date (e.g. 17/11/2017)

はまだ、私が行っている次

## Import web page 
FAO_Countries <- read_html("http://www.fao.org/countryprofiles/en/") 

## Import the urls I am interested in with 'selectorgadget' 
FAO_Countries_urls <- FAO_Countries %>% 
html_nodes(".linkcountry") %>% 
html_attr("href") 

## Import the links I am interested in with 'selectorgadget' 
FAO_Countries_links <- FAO_Countries %>% 
html_nodes(".linkcountry") %>% 
html_text() 

## I create a dataframe with two previous objects 
FAO_Countries_data <- data.frame(FAO_Countries_links=FAO_Countries_links, 
FAO_Countries_urls = FAO_Countries_urls, stringsAsFactors = FALSE)

どのように進めるべき？ここで

出典

2017-11-25 Ileeo

あなたがa）は、あなたがロードされているパッケージの一覧を表示する必要があり、およびb）あなたが持っているかの困難を示しています。 –

1.パッケージは、次のとおりです。 rvest、 stringr、 tidyr、 data.table、 plyr、 XML2。 2.ニュースとニュースの日付を取得できません – Ileeo

はソリューションです...

あなたはすべての国のよう全体のニュースを取得したい場合は、動的に短縮名（すなわち米国、SWEなど）を変更する必要があります。 AFGの例を以下に示します。あなたは、ニュースの内容を取得したい場合は、このコードを追加することができ

library(jsonlite) 

    shortname <-"AFG" 
    news_url <- paste0("http://www.fao.org/countryprofiles/common/allnews/en/?iso3=",shortname,"&allnews=no&limit=2") 
    news<- fromJSON(news_url) 

    title <- news[3] 

    date <- news[6] 

    cbind(date,title) 

     date_format                          title 
    1 17/10/2017               World Food Day 2017 Celebrations in Afghanistan 
    2 16/10/2017 On World Food Day, the future of migration and rural development is highlighted in the Asia-Pacific region

：

content <- news[5]

出典

2017-11-25 20:25:06 maydin

R Webscrapingデータセット

答えて

関連する問題