Let say we have a xml formatted file as:
Here is a sample program to extract title and tag from above xml file using jsoup library in java.
Once importing jsoup library in our program, we can write program as:
snapshot of a stack-overflow data dump xml file as posts.xml |
Here is a sample program to extract title and tag from above xml file using jsoup library in java.
Once importing jsoup library in our program, we can write program as:
public class JsoupParser { public static void main(String[] args) throws Exception { String question; String tag; File fileQuestion = new File("F:\\Data\\stackoverflow.com-Posts\\posts.xml"); Document docQuestion = Jsoup.parse(fileQuestion, "UTF-8"); Elements eachrowQuestion = docQuestion.getElementsByTag("row"); for (int i = 0; i < eachrowQuestion.size(); i++) { question= eachrowQuestion.get(i).attr("Title"); tag=eachrowQuestion.get(i).attr("Tags"); System.out.println(question+ "---->" + tag); } } }
Now we can easily save the extracted question title and tag into another format of files like csv,tsv,txt etc.
Comments
Post a Comment