htmlページソースからのデータの抽出

-1

ウェブサイトから特定のデータを抽出する必要があります。htmlページソースからのデータの抽出

私はこのyoutubeのビデオを見ました https://www.youtube.com/watch?v=rru3G7PLVjw と大まかにそれをコード化する方法の感覚を持っています。

基本的に私は何をしたいことを抽出して保存することである（ラジオボタンのテキスト）非常に簡単！https://docs.google.com/forms/d/1Mout_ImbF9N16EuCiYOxCrL6MbkUVkIEzijO1PAUQ68/viewform?key=pqbhTz7PIHum_4qKEdbUWVg

のページのソースからリスト

への容易な、かなり簡単ではないし、印刷リストの要素を外に出してください

次は、私がyoutubeビデオに基づいて書いたc＃コードです。

using System.Net; 
using System; 
using System.Collections.Generic; 
using System.Text.RegularExpressions; 

namespace ExtractDataFromWebsite 
{ 
    class Program 
    { 
     static void Main(string[] args) 
     { 
      List<string> radioOptions = new List<string>(); 
      WebClient web = new WebClient(); 

      // download html from certain website 
      string html = web.DownloadString("https://docs.google.com/forms/d/1Mout_ImbF9N16EuCiYOxCrL6MbkUVkIEzijO1PAUQ68/viewform?key=pqbhTz7PIHum_4qKEdbUWVg"); 

      MatchCollection m1 = Regex.Matches(html, @"<input\stype=/"radio"\sname=/"entry.2362106/"\svalue="(.+)\sid =/ "group_2362106_" 
       , RegexOptions.Singleline); 
      foreach (Match m in m1) 
      { 
        string radioOption = m.Groups[1].Value; 
        radioOptions.Add(radioOption); 
      } 
      for (int i=0; i< radioOptions.Count;i++) 
       Console.WriteLine(""+ radioOptions[i]); 

      Console.ReadKey(); 
     } 
    } 
}

しかしラインMatchCollection M1 = Regex.Matches ......私が解決する方法がわからない、いくつかの問題を抱えています。誰かが私にいくつかのヒントを提供したり、上記の問題を解決するために助けることができる

希望はHtmlAgilityPackにあなた

出典

2016-06-30 xiaoxin

は、あなたがこの[質問]（http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-selfを読む提案します-contained-tags/1732454＃1732454）。 –

ルック大変ありがとうございました。 Webクライアントの応答から新しいhtmldocumentにソースをロードし、そこからかなり簡単にトラバースできます。

出典

2016-06-30 15:58:18

値抽出として、この正規表現を試してみてください：

MatchCollection m1 = Regex.Matches(html, "<input type=\"radio\".+?value=\"(.+?)\".+?\">" 
      , RegexOptions.Singleline);

出典

2016-06-30 15:59:30

htmlページソースからのデータの抽出

答えて

関連する問題