Google Summer of Code 2019 FINAL REPORT

My Google Summer of Code (GSOC) project was focused on “Implementing split view” in gnome-gitg. This blog post serves as my final submission to my Google Summer of Code project.

Under this project there are 4 important parts which need to be implemented.

  1. Split view does not makes sense when the diff is of Images and binaries or other non text versions. (Completed)
  2. Meld like “Diff bar curves” can be ported to Gitg or not (Curves completely implemented, Gitg port work partially complete and algorithm tuning for corner cases remains)
  3. Make a tool application to aid diff bar curves development (Completed)
  4. Three-way split view in case like three-way merge. (Partially complete, UI is ready, but we discovered that at a later stage that back end needs some major improvements to make way for three-way split view so that work is partially done)

And to put everything together cleanly :-

  • Refactor existing code. (Partially complete)

I am happy to say that most of the parts have been realized and will soon land on Gitg split view MR while some of them have already landed.

My workflow was distributed, so we have a main split view MR !18 and my whole work is to improve this MR so that it can finally be merged into master.

Work done :-

  • Hide split view when it does not make sense !85.
  • Refactor existing code !102.
  • Diff Bar drawing estimation !9. here we explored how to draw diff bar curves as meld does, we are quite accurate, but there are some corner cases which points to the conclusion, that it’s better to keep the diff bar as an experimental feature for the time being. (see commit)
    • A complete analytical GUI flatpak application to tune and aid development of Diff bar curves drawing and implementation logic !9.
  • Refactor existing code based UI to XML based UI (Gtk.Builder) which makes more sense and introduction of stack switcher. !101.
As shown in the video split view only makes sense for text and not for things like image. This video also shows the implementation of stack switcher for unified and split view.

Remaining work :-

  1. Polish diff bar curve joining algorithm for close corner cases. (!5)
  2. Three-way back end implementation. (!98)

Challenges

Diff bar curves

Diff Bar curve joining algorithm is a custom algorithm which will try to solve our problem of joining multiple single line diff curves into a nice single curve.

Here for every addition and deletion we get line data like

old line : -1

new line : 1

type : GGIT_DIFF_LINE_ADDITION

So here we follow this data and draw curves. But a problem arises here, say we have addition from lines 1 to 6, a single curve should be drawn from line 1 to 6. But here, there will be no single curve drawn from line 1 to 6, instead there will be 6 curves drawn for each line separately. This is because the line callback gives data about each line separately. It is really challenging to join the related diff lines and add them so that they consume less memory overall by reducing the number of dynamic variables formed for each line and make more sense. We hit about 80-90% accuracy and easily handle all the normal cases, but there are some complex cases where the drawn curves can simply be described as a mess of combination of deletes and additions. That’s where it needs polishing.

Comparison:- Without Diff Joins (left) and With Diff Joins (right)

To mention a corner case, notice the three-way Diferencia diff mentioned later in this post. Here, if one is to look at the diff curve carefully, the right addition type diff curve is incorrect, or more accurately, has bad accuracy. This is because the current algorithm to join diffs did not successfully separate two diff hunks for “hello” and “world”. This corner case arises because of the fact that to process the diffs, they need to be first filtered according to their types which makes it harder for the code to differentiate hunks and the starting point of different hunk. This occurs because the algorithm takes all the processed diff data and then processes it. This results with dynamic hunk data unmapped with the line diff data. With further processing this kind of corner case can be solved by, for example, providing the post-processing function with hunk data whenever it changes and restarting the rendering process for current and dynamic hunk information, though this is just a proposition and is probably not going to work. But this also proves to some extent that this issue can be solved.

Three-way merge

Gitg’s existing code is not compatible with three-way merge, hence there are two parts needed to make sure it’s realized. First is implementation of necessary changes in the front end, and other is back end of said front end.

Gitg’s current state in case of a three-way merge

As mentioned at the start of this post, back end is the challenging part here, and implementation requires existing Gitg code to go through many design changes. To understand how it can be implemented I read lots of existing code to understand how diff is actually implemented in Gitg, and there I got to understand that in Gitg we render each file separately from a diff object.

Here is the comparison between existing three-way state in Gitg, and the work done for three-way merge in my project. We can see that earlier we had to choose which parent, while here we do keep one parent in left and one in right. Notice in case of three-way merge, the option to choose parents disappears.

The following image shows with an example, that three-way is possible in Diferencia and can also be implemented in Gitg. Diferencia is the debugging tool developed with the help of my mentor Alberto Fanjul to test the algorithms for rendering curves, handling diffs, and three-way merge, so that it can easily be ported to Gitg.

Three-way diff debugging in Diferencia using the joint-diffs algorithm.

Now, a diff object represents changes done to a given file between a commit from it’s parent or older file. And that’s the existing design for now, it’s based upon a single parent and child pair. While back end implementation of three-way merge require 2 such diff’s pairs. Here, I was also able to extract this information from existing code, but the challenge is to render this information. This will require an entirely new design for rendering the file diffs. Another place that needs attention is, how to handle the common files between the diff pair and uncommon ones, as three-way view would not make sense for files that only have one parent.

Final thoughts

There were a lot of thins things I learnt from my experience while coding for my GSOC project, some were skills while some were things that i found very interesting and contrary to what I used to think :^)

Here are some skills i learnt :-

  • GNOME workflow.
  • How to do upstream contribution.
  • Advanced git concepts.
  • Code documentation.
  • Meson building system.
  • Flatpak’s building and distribution system.
  • GTK.
  • Advanced vala concepts.
  • Gitlab’s CI/CD.
  • Basics of what a Docker is and how to use them.
  • Exposed to Libgit2 code-base which is one of our most important dependency.
  • Using patches to fix dependencies.
  • UI making using Gtk.Builder and Glade.
  • Code linters such as uncrustify and how to use them.
  • Git diff mechanism.
  • Good coding practices.

etc.

The Most important lesson for me was that “FOSS is more about people than code”. It’s just really overwhelming to see how communities supports each other and makes wonderful software for the betterment of the society seeking freedom and power in hands of their USERS.

Google Summer of Code allowed me to contribute 3 months full-time to my community, where me being a newcomer not only just advanced in coding and development but it also gave me a chance to get a wider look at all things going on in the GNOME community, this would certainly be impossible if I were doing something else like school or non-open source internships with contributions on weekends. And finally as a result I was able to contribute a little to the engagement team and make new extremely talented friends!!

I also contributed to other initiatives, like “Newcomers Initiative” where we try to make sure newcomers can easily solve their early issues and do not face any difficulties in tools and setup.

I feel extremely fortunate and grateful to be mentored by Alberto Fanjul who is not just a talented engineer and mentor but also a rare soul who is kind and understands what needs to be done and how it will be done. He had a nice vision of leading and designing the plan for my project.

He mentored my very nicely and was always ready to address my most silly to most obvious queries.

I will also like to appreciate efforts of GSOC admins at GNOME Foundation who organized this program in GNOME and made everything went to smooth.

I honestly can’t find a better organization then GNOME, people here are so nice and supportive. I would also like to say a huge thanks to GNOME Foundation for giving me the chance to do this internship here.

Special thanks

A huge shout out to these people and friends who supported me in this journey :-

Alexander Mikhaylenko, Adwait Rawat, Felipe Borges, Sriram Ramkrishna, Kristi Progri, Carlos Soriano, Alexandre Franke, Umang Jain, Meg Ford, Nuritzi Sanchez, …..

GSOC is totally one of the biggest contributions by Google to the Open Source Communities and I can’t thank Google enough for supporting students like me and giving us this amazing opportunity.

1 thought on “Google Summer of Code 2019 FINAL REPORT”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s