MSR '18- Proceedings of the 15th International Conference on Mining Software Repositories
SESSION: Data showcase
50K-C: a dataset of compilable, and compiled, Java projects
We provide a repository of 50,000 compilable Java projects. Each project in this dataset comes with references to all the dependencies required to compile it, the resulting bytecode, and the scripts with which the projects were built.
MSR '20: Proceedings of the 17th International Conference on Mining Software Repositories
SESSION: Mining Challenge
The Software Heritage Graph Dataset: Large-scale Analysis of Public Software Development History
Software Heritage is the largest existing public archive of software source code and accompanying development history. It spans more than five billion unique source code files and one billion unique commits, coming from more than 80 million software projects. These software artifacts were retrieved from major collaborative development platforms (e.g., GitHub, GitLab) and package repositories (e.g., PyPI, Debian, NPM), and stored in a uniform representation linking together source code files, directories, commits, and full snapshots of version control systems (VCS) repositories as observed by Software Heritage during periodic crawls. This dataset is unique in terms of accessibility and scale, and allows to explore a number of research questions on the long tail of public software development, instead of solely focusing on "most starred" repositories as it often happens.
NL4SE 2018- Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering
SESSION: Keynote
Learning from code with graphs (keynote)
Learning from large corpora of source code ("Big Code") has seen increasing interest over the past few years. A first wave of work has focused on leveraging off-the-shelf methods from other machine learning fields such as natural language processing. While these techniques have succeeded in showing the feasibility of learning from code, and led to some initial practical solutions, they forego explicit use of known program semantics. In a range of recent work, we have tried to solve this issue by integrating deep learning techniques with program analysis methods in graphs. Graphs are a convenient, general formalism to model entities and their relationships, and are seeing increasing interest from machine learning researchers as well. In this talk, I present two applications of graph-based learning to understanding and generating programs and discuss a range of future work building on the success of this work.
PESOS 2014- Proceedings of the 6th International Workshop on Principles of Engineering Service-Oriented and Cloud Systems
SLA evaluation with on-the-fly measurements of distributed service implementation over clouds
A survey on checkpointing web services
Consumer-centric non-functional properties of SOA-based services
Interaction patterns based checkpointing of choreographed web services
Towards the formalization of properties of cloud-based elastic systems
Towards exploiting the full adaptation potential of cloud applications
PROMISE 2021: Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering
SESSION: Papers
Heterogeneous ensemble imputation for software development effort estimation
Choosing the appropriate Missing Data (MD) imputation technique for a given Software development effort estimation (SDEE) technique is not a trivial task. In fact, the impact of the MD imputation on the estimation output depends on the dataset and the SDEE technique used and there is no best imputation technique in all contexts. Thus, an attractive solution is to use more than one single imputation technique and combine their results for a final imputation outcome. This concept is called ensemble imputation and can help to significantly improve the estimation accuracy. This paper develops and evaluates a heterogeneous ensemble imputation whose members were the four single imputation techniques: K-Nearest Neighbors (KNN), Expectation Maximization (EM), Support Vector Regression (SVR), and Decision Trees (DT). The impact of the ensemble imputation was evaluated and compared with those of the four single imputation techniques on the accuracy measured in terms of the standardized accuracy criterion of four SDEE techniques: Case Based Reasoning (CBR), Multi-Layers Perceptron (MLP), Support Vector Regression (SVR) and Reduced Error Pruning Tree (REPTree). The Wilcoxon statistical test was also performed in order to assess whether the results are significant. All the empirical evaluations were carried out over the six datasets, namely, ISBSG, China, COCOMO81, Desharnais, Kemerer, and Miyazaki. Results show that the use of heterogeneous ensemble-based imputation instead single imputation significantly improved the accuracy of the four SDEE techniques. Indeed, the ensemble imputation technique was ranked either first or second in all contexts.
QoSA '14- Proceedings of the 10th international ACM Sigsoft conference on Quality of software architectures
SESSION: Keynote address
Trust or verify?
SESSION: Performance of architectures
Automatic detection of performance anti-patterns in inter-component communications
Architectural tactics support in cloud computing providers: the jelastic case
Performance-based selection of software and hardware features under parameter uncertainty
SESSION: Performance modelling
Dealing with uncertainties in the performance modelling of software systems
Experiences with modeling memory contention for multi-core industrial real-time systems
Using architecture-level performance models as resource profiles for enterprise applications
SESSION: Architecture evaluation
Empirical resilience evaluation of an architecture-based self-adaptive software system
Architecture management and evaluation in mature products: experiences from a lightweight approach
Failure data collection for reliability prediction models: a survey
SESSION: Architecture analysis I
Efficient re-resolution of SMT specifications for evolving software architectures
Regression verification of AADL models through slicing of system dependence graphs
Evaluation of a static architectural conformance checking method in a line of computer games
SESSION: Architecture analysis II
An empirical investigation of modularity metrics for indicating architectural technical debt
Formalizing correspondence rules for automotive architecture views
SRMP: a software pattern for deadlocks prevention inreal-time concurrency models
TUTORIAL SESSION: Tutorials
Software QoS enhancement through self-adaptation and formal models
Designing and evolving distributed architecture using kevoree
RAISE 2014- Proceedings of the 3rd International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering
Towards more intelligent trace retrieval algorithms
A mapping study on bayesian networks for software quality prediction
SANAYOJAN: a framework for traceability link recovery between use-cases in software requirement specification and regulatory documents
Supporting comprehension of unfamiliar programs by modeling an expert's perception
Machine learning for constituency test of coordinating conjunctions in requirements specifications
OCL usability: a major challenge in adopting UML
A self-learning approach for validation of communication in embedded systems
Deriving time lines from texts
Mining issue tracking systems using topic models for trend analysis, corpus exploration, and understanding evolution
RAISE '18- Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering
SESSION: Natural language and text data
Integrating a dialog component into a framework for spoken language understanding
Spoken language interfaces are the latest trend in human computer interaction. Users enjoy the newly found freedom but developers face an unfamiliar and daunting task. Creating reactive spoken language interfaces requires skills in natural language processing. We show how a developer can integrate a dialog component in a natural language processing system by means of software engineering methods. Our research project PARSE that aims at naturalistic end-user programming in spoken natural language serves as an example. We integrate a dialog component with PARSE without affecting its other components: We modularize the dialog management and introduce dialog acts that bundle a trigger for the dialog and the reaction of the system. We implemented three dialog acts to address the following issues: speech recognition uncertainties, coreference ambiguities, and incomplete conditionals.
RCoSE 2014- Proceedings of the 1st International Workshop on Rapid Continuous Software Engineering
SESSION: Overview
Continuous software engineering and beyond: trends and challenges
SESSION: Technology Aspects
Rapid requirements checks with requirements smells: two case studies
Rapidly locating and understanding errors using runtime monitoring of architecture-carrying code
Building blocks for continuous experimentation
SESSION: Process Aspects
Supported approach for agile methods adaptation: an adoption study
Rugby: an agile process model based on continuous delivery
Scrum for cyber-physical systems: a process proposal
SESSION: Cross-Cutting Concerns
Personalised continuous software engineering
RCoSE '18- Proceedings of the 4th International Workshop on Rapid Continuous Software Engineering
SESSION: Tooling and applications
Designing a next-generation continuous software delivery system: concepts and architecture
Continuous Integration and Continuous Delivery are established practices in modern agile software development. The DevOps movement adapted theses practices and places the deployment pipeline at its heart as one of the main requirements to automate the software development process and to deliver and operate software in a more robust way with higher quality.
Copyright (c) 2020 - 2025, SIGSOFT; all rights reserved.
Template by Bootstrapious. Ported to Hugo by DevCows.
