A few notes about our legacy data

We regularly receive notification that DOAJ data has been used for analysis; analysis done by publishers, librarians, students, technologists, bloggers and many others. That the data is central to so many studies continues to reinforce the importance of the DOAJ in the open access movement. We are confident that, once our current upgrade is complete, and when all the existing journals have been re-evaluated, DOAJ will provide data of an even higher quality that is incomparable to the “old” DOAJ; that is updated more frequently and of a previously unseen level of granularity. It will be a dataset monitored by a large, international network of Associate Editors and Editors, consistently checking and reviewing.

That the data is so regularly used places a responsibility at DOAJ’s door to ensure that the level of data quality is high. This is a responsibility that we take seriously and so I thought it worth clarifying a few points about the DOAJ data.

 

“This site is undergoing maintenance”

The data in the DOAJ database has been collected over a 10 year period and in those last 10 years, not only has the data been through several migrations and transformations but we have seen the size of DOAJ grow from 300 to just short of 10 000 journals. That rate of growth is increasing year on year. This means that we have a large amount of legacy data.

In 2013, we announced that, in response to the changing nature of open access, we would change significantly the inclusion criteria for journals to be listed in the DOAJ, developing our back-end systems accordingly to match the dramatic increase in the resulting workload.  The Community was involved as we sought opinion on the changes we should make. The changes needed amounted to a huge piece of development which required certain activities to be put on hold. Additionally, DOAJ migrated platform in December 2013 so the routine activities of adding journals, removing journals and adding article metadata had to be placed on hold. (Adding new journals was eventually on hold for just over 4 months; removing journals for just under 3.) It wasn’t without a good reason though. DOAJ was migrated to an open source, standards based stable database, hosted by our technical partners Cottage Labs. This also necessitated a substantial clean up of the legacy data.

The result of such a huge project meant that the usual level of data maintenance by our Editorial Team decreased. It didn’t stop – during the first quarter of 2014, 92 journals were earmarked for removal – but for a few months, the public view of DOAJ remained relatively static because the usual weeding and refining was on hold for a few months. There are still areas of the DOAJ data that we know needs to be reviewed and corrected.

Previously, not all information was compulsory

A publisher applying for a journal to be included in the old DOAJ was only required to provide 6 initial pieces of information. Once accepted, the publisher was encouraged to return to the site to provide further information about the journal. One such piece of information was the author processing charge (APC). This is clearly an important piece of information that authors, in particular, like to know up front. Therefore we took the opportunity, when we were designing the new application form, to raise the visibility of this information, require it on application and make it a compulsory question. Naturally, this means that our legacy data has holes in it which need to be filled. All the new journals applying for inclusion after March 19th 2014 have already answered this question. Once we start the re-application process, the ~9700 existing journals in DOAJ will have to answer this question. This process is scheduled to begin in the 3rd quarter of 2014.

All DOAJ data is publisher-provided

While we can force an answer to a question in a web form, we cannot force someone to return to DOAJ and update their information. Of course, information changes over time and often we find that publishers have forgotten to update us. Our system of regular review – our aim is that every journal be reviewed at least once a year – hopes to catch these changes and correct them as quickly as possible. Our new network of Associate Editors will do this more efficiently and pro-actively than ever before but we still encourage the community to get in touch when it spots things that seem to differ. The community can play an important role as our eyes and ears and we encourage that.

 

We first announced the start of our transition period back in October 2013. Since then we have been very open, not only about our progress and development plans but also about the effects that the work would have on the DOAJ itself. Hopefully, with this post I have given you a little more detail as to why there may be inaccuracies in the DOAJ, what we have done to address those and what we will continue to do as our database develops.

We appreciate the patience and support from the community! As always, do please get in touch should you have questions or comments.

 

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s