Skip to main content

Parsing xml file with JSOUP

Let say we have a xml formatted file as:

snapshot of a stack-overflow data dump xml file as posts.xml

Here is a sample program to extract title and tag from above xml file using jsoup library in java.
Once importing  jsoup library  in our program, we can write program as:


public class JsoupParser {

    public static void main(String[] args) throws Exception {

        String question;
        String tag;
        File fileQuestion = new File("F:\\Data\\stackoverflow.com-Posts\\posts.xml");
        Document docQuestion = Jsoup.parse(fileQuestion, "UTF-8");
        Elements eachrowQuestion = docQuestion.getElementsByTag("row");
      
        for (int i = 0; i < eachrowQuestion.size(); i++) {

            question= eachrowQuestion.get(i).attr("Title");
            tag=eachrowQuestion.get(i).attr("Tags");

            System.out.println(question+ "---->" + tag);

        }
   }
}

Now we can easily save the extracted question title and tag into another format of files like csv,tsv,txt etc.

Comments

Popular posts from this blog

Unit Testing in java with JUnit Testing

Performing regression testing for each unit in the application is done by using JUnit testing framework in java. To know whether our created component of program are performing as expected or not, we can use JUnit testing framework. For the sake of simplicity let's perform a JUnit testing (in my case  using Netbeans IDE) for method which simply adds two integer and return the sum of two integers. Class with definiton of sum method Using our Netbeans IDE we can create test files for each individual file, by clicking class name inside the package, hover to tools then select create/update tests. After clicking create Update/Test options we are instructed to select the Testing framework, leave it as default and click ok. After selecting appropriate version of JUnit framework file with classname appended with 'Test'  is created inside Test Packages which includes the method inside our class with test before the actual method name and uppercase for f...

Shahrukh khan At a Glance

Unbeaten king since twenty-two years: Shah Rukh Khan (“Naam toh suna hi hoga”) Through this article i am not emphasizing over my favorite actor. He is the legend that every-one of you must know about him. Figuratively Shahrukh khan is an actor, film producer and TV personality while speaking literally these words aren’t enough that describes khan. After reading this article anyone can figure out why i am approaching him so deeply.                                                                                               ...

Is java pass by value or pass by reference ??

After getting familiar with java programming fundamentals one with programming mind always thinks about 'is java pass by value or reference '. Throughout this blog we would be able to conclude that java is pass by value. Java is always pass by value, often confusion occurs when Object are passed into method as arguments. Pass by value: Make a copy in memory of the actual parameter's value that is  passed in. For eg: suppose you hava file1.txt in local disk, let say 'D' drive, copying file1.txt into Desktop and changing it doesn't change the content inside the file1.txt of 'D' drive. Pass by reference: pass a copy of the address of actual parameters. For eg: suppose you created a shortcut file for file1.txt in desktop then changing contents from shortcuts also changes file in 'D' drive. Above program prints 9, from which means copy of value 9 is sent to the argument for test function due to which printing value of x in main method give...