Test Metric Report
Last modified by Vincent Massol on 2018/05/11 17:38
Contents
Context: This report provides information related to testing in the XWiki project for the STAMPS research project that XWiki SAS is participating to. However since this information can be generally useful to everyone, the data is generated openly on this page.
Static Metrics
- Sources:
- sonar.xwiki.org
- For #Test Classes: find . -name "*Test.java" | wc -l and find . -name "*Tests.java" | wc -l in each git repo
- For #Parametrized Tests: Difference between total number of tests executed - Test methods. If > 0 then there are parametrized tests. To get the total number of tests executed, we use sonar. For ex for Commons http://ci.xwiki.org/job/xwiki-commons/lastSuccessfulBuild/testReport/api/xml, and then totalCount.
- For #UI Test classes, we use find . -name "*Test.java" -path "*-test/*-tests/*" | wc -l in platform. For Enterprise we just count all files ending with "Test(s).java" since they're all UI tests.
- Right now XWiki Enterprise is not properly configured on sonar.xwiki.org
- For Enterprise # Test Methods, we have that info in Clover reports. Date of collection: 2017-02-22
- Date of data collection: 2016-12-02 (for NCLOC + #Classes + #Test Methods), 2017-01-06 for the rest and for "Enterprise"
Repository | NCLOC for Java code | # Classes | # Test Classes | # Test Methods | # Parametrized Tests | # UI Test Classes |
---|---|---|---|---|---|---|
Commons | 54429 | 1097 | 210 | 1057 | 1078 tests vs 1057 test methods | 0 |
Rendering | 37490 | 623 | 119 | 299 | 1610 tests vs 299 test methods | 0 |
Platform | 236808 | 3531 | 723 | 3382 | 3804 tests vs 3382 test methods | 69 |
Enterprise | ? | ? | 99 | 4288 | ? | 99 |
TOTAL | 328727 | 5251 | 1151 | 4738 | 6492 tests vs 4738 test methods | 168 |
Dynamic Metrics
Statement Coverage
- Source: CI pipeline job on ci.xwiki.org, see also this blog post.
- Date of data collection: 2016-12-20
Note: Definition of TPC
- Commons: 66.2% TPC
- Commons + Rendering: 73.9% TPC
- Commons + Rendering + Platform: 67.5% TPC
- Commons + Rendering + Platform + Enterprise: 73.2% TPC
Flickering Tests
- Source: https://jira.xwiki.org with the JQL: labels = flickering and resolution = Unresolved and category = 10000
- This includes Commons, Rendering, Platform and Enterprise
- Date of data collection: 2017-02-22
- Value: 12
- List of JIRA issues:
Test Execution Times
- Source: The Clover Report mentions the test execution times
- Date of data collection: 2017-02-22
- Commons: 38 seconds for 1018 tests (16 failing)
- Commons + Rendering: 67 seconds for 2662 tests (19 failing)
- Commons + Rendering + Platform: 1871 seconds for 5709 tests (49 failing)
- Commons + Rendering + Platform + Enterprise: 7894 seconds for 9997 tests (136 failing)
Note: Some tests don't really fail, it's actually a limitation of Clover in how it interprets the Test results.
Deployment Time for Manual Testing
Metric: This is a measurement of the amount of time that is needed to set up the software for the purposes of manual testing, this does not include the time required for compiling the source code to binary.
- Source: time to install xwiki manually
- Date of collection: 2017-01-01
- Time to download the XWiki WAR + time to download a DB (MySQL, PostgreSQL) + time to download a servlet container (Tomcat, etc) + time to set it up = 2 hours
- Date of collection: 2017-02-22
- Install Docker once and use the XWiki Docker image (for MySQL + Tomcat) = 5mn
System-specific bugs
- Source: Manual tests performed by the XWiki SAS QA team as reported on http://test.xwiki.org
- Report for 2016: http://test.xwiki.org/xwiki/bin/view/Reports/QA+2016/
- Note: we consider all those tests to be configuration-specific tests
- Results for 2016 (percentage of failing tests): 139 / 4967 * 100 = 2.79%
Crash replicating test cases
- Source: Approximation: search for jira issues having a stack trace in them. Then find issues having modified Test in them
- JQL: description ~ '.java:' AND description ~ 'Exception' AND category = 10000 AND createdDate >= 2016-01-01 AND createdDate <= 2016-12-31 and resolution not in ("Cannot Reproduce", Duplicate, Inactive, Incomplete, Invalid, "Won't Fix") ORDER BY created DESC
- Date of collection: 2016
- Number of issues with stack traces (before removing the false positives): 25
- Number of issues with stack traces (after removing the false positives): 17
- Number of issues with stack traces and tests to prove the fix: 3
- % test suite = 3 / (total test number) = 3 / 9962 = 0.03%
- Total test number: 9962
- Date of collection: 2017
- Number of issues with stack traces (before removing the false positives): 23
- Number of issues with stack traces and tests to prove the fix: 5 (XWIKI-14802, XWIKI-14766, XWIKI-14613, XWIKI-14556, XWIKI-14152)
- % test suite = 3 / (total test number) = 5 / 10433 = 0.04%
- Total test number: 10433
Other Metrics
- Number of covered public methods: Don't know how to filter this out, maybe could be done with Clover Code contexts
- Execution time of test suites
- Mutation score with Pitest
Building a Regression Benchmark
- R1: (Warmup) Identify 10 Jira tickets about regression
- R2: Identify more Jira tickets about regression
- R3: (Warmup) Identify 10 commits which fix a regression
- R4: Identify more commits fix a regression
- R5: (Warmup) Identify 10 commits which introduce a regression
- R6: Identify more commits which introduce a regression
DSpot
- Compute the percentage of false positives
- Compute the effectiveness of each amplification operaor
- Set up extensible architecture to easily support new test case transformation operators
- Propose new test case transformation operators
- Contribute to high-quality user documentation
- Package Dspot as maven plugin usable in CI
- Package Dspot as Gradle plugin usable in CI
- Implement live parallel regression testing
- Store generated tests and their history (define the API): commit version of original test, test method name, see used, amplification operators used
Dspot + Use cases
- Apply Dspot on the Commits of R3 to see whether the regressions are caught
- Apply Dspot on the Commits of R4 to see whether the regressions are caught
- Apply Dspot on the Commits of R5 to see whether the regressions are caught
- Apply Dspot on the Commits of R6 to see whether the regressions are caught