Skip to main content

Parsing xml file with JSOUP

Let say we have a xml formatted file as:

snapshot of a stack-overflow data dump xml file as posts.xml

Here is a sample program to extract title and tag from above xml file using jsoup library in java.
Once importing  jsoup library  in our program, we can write program as:


public class JsoupParser {

    public static void main(String[] args) throws Exception {

        String question;
        String tag;
        File fileQuestion = new File("F:\\Data\\stackoverflow.com-Posts\\posts.xml");
        Document docQuestion = Jsoup.parse(fileQuestion, "UTF-8");
        Elements eachrowQuestion = docQuestion.getElementsByTag("row");
      
        for (int i = 0; i < eachrowQuestion.size(); i++) {

            question= eachrowQuestion.get(i).attr("Title");
            tag=eachrowQuestion.get(i).attr("Tags");

            System.out.println(question+ "---->" + tag);

        }
   }
}

Now we can easily save the extracted question title and tag into another format of files like csv,tsv,txt etc.

Comments

Popular posts from this blog

Shahrukh khan At a Glance

Unbeaten king since twenty-two years: Shah Rukh Khan (“Naam toh suna hi hoga”) Through this article i am not emphasizing over my favorite actor. He is the legend that every-one of you must know about him. Figuratively Shahrukh khan is an actor, film producer and TV personality while speaking literally these words aren’t enough that describes khan. After reading this article anyone can figure out why i am approaching him so deeply.                                                                                               ...

Angular JS Warm-Up

Angular JS is not just another javascript library but an client side javascript framework developed and maintained by Google. AngularJS brings in a revolution in the field of Single Page Application (SPAs) allowing to write a propered architectured, maintainable and testable client side code. It's ability to bring in the power of MVC framework in client side programning is one of the reason it's being popular these days. As we know good software design has High Cohesion (how well does that one thing stick to do just one thing) and  Loose Coupling (least possible dependency of one component on another component)Angular JS offers MVC/MVVM paradigm to create components more flexible and loosely coupled Being an client side framework, Angular JS comprised of HTML, CSS and JavaScript/TypeScript. Angular JS was written in TypeScript because: script with type is typescript and script without type is JavaScript 😂🤣 #devQuotes — ashwin (@AshwinJung) October 22, 201...

Unit Testing in java with JUnit Testing

Performing regression testing for each unit in the application is done by using JUnit testing framework in java. To know whether our created component of program are performing as expected or not, we can use JUnit testing framework. For the sake of simplicity let's perform a JUnit testing (in my case  using Netbeans IDE) for method which simply adds two integer and return the sum of two integers. Class with definiton of sum method Using our Netbeans IDE we can create test files for each individual file, by clicking class name inside the package, hover to tools then select create/update tests. After clicking create Update/Test options we are instructed to select the Testing framework, leave it as default and click ok. After selecting appropriate version of JUnit framework file with classname appended with 'Test'  is created inside Test Packages which includes the method inside our class with test before the actual method name and uppercase for f...